Finding The Median Of Two Sorted Arrays
This is one of the beautiful cases where solving the general case first, and then applying it to a particular case is simpler and smoother than solving the particular case at once.
The problem:
Given two sorted arrays, “a” and “b”, with sizes “sa” and “sb” respectively, find the median of the union of these arrays.
Complexity requirements: O(log(sa + sb)) in the worst case, both time and space.
There are a few solutions for this problem, but non of them is as intuitive as the solution of a more general problem: The Select Problem:
Given two sorted arrays, “a” and “b”, with sizes “sa” and “sb” respectively, find the k-th smallest element of the union of these arrays. Under the same complexity requirements as above.
Once the select problem is solved, the median problem is solved as well, since the median is the ((sa + sb)/2 + 1)th smallest element (if sa+sb is odd), or the average of the two middle elements (if sa+sb is even).
So, let’s solve the select problem first:
The idea is to (virtually) partition “a” into two arrays: “a_left” and “a_right”, and the same for “b”: “b_left” and “b_right”, such that the size of “a_left” plus the size of “b_left” is exactly k.
Ideally, a_left will be a[0..(k/2-1)] and b_left will be b[0..(k/2-1)]. Unless, of course, a is smaller than k/2, in that case we’ll take “sa” elements from “a” (i.e. the whole array), and the rest of the elements (k – sa elements) from”b”.
Let’s assume that a_left has “na” elemetns, and b_left has “nb” elements. As we stated before, na+nb = k.
Trivial case 1:
k = 1. i.e., we need to find the smallest element of the union of “a” and “b”. Obviously, it’s either the first element of “a” or the first element of “b”. This operation is cheap. O(1)
Trivial case 2:
a[na-1] = b[nb-1]
In this case, the k-th smallest element is a[na-1] (or b[nb-1]): We have exactly (na + nb = k) elements smaller than it.
Case 3:
a[na-1] < b[nb-1]
This is the interesting case. If a[na-1] is smaller than b[nb-1], then:
1. The k-th smallest element cannot be in the “a_left” array:
All elements in “a_right” are greater than any of those in “a_left” (“a” is sorted). Total: sa – na elements
All elements in “b_right” are greater than any of those in “a_left” (“b” is sorted and a[na-1] < b[nb-1]). Total: sb – nb elements
At least one element in “b_left” is greater than any of those in “a_left” (we already know that b[nb-1] is such an element). Total: 1 (at least)
So, for any element in “a_left”, there are at least (sa – na + sb – nb + 1 = sa + sb – k + 1) elements which are greater than it.
The k-th smallest element must have exactly (sa + sb – k) elements greater than it. Hence, none of the elements in “a_left” can be the k-th smallest element.
2. What we do know about the elements in “a_left”, is that all of them are among the k smallest elements (but none is the k-th smallest) – Using the same logics above.
3. None of the elements in “b_right” are the k-th smallest element. We already know that all the elements in “a_left” and all the elements in “b_left” (total: k elements) are smaller than any element in “b_right”
From the three observation above, we can state that the k-th smallest element of the union of “a” and “b”, is the (k – na)th smallest element of the union of “a_right” and “b_left”: We excluded “a_left” in (1), excluded “b_right” in (3), and we already have “na” of the k smallest elements, so we need to find the (k-na)th smallest in the arrays not excluded.
The last case, where a[na-1] > b[nb-1] is similar to the previous case, except that “a” and “b” have exchanged there roles.
Here is the implementation in C of the above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #include <stdio.h> /* select: gets two sorted positive integer arrays, a & b, with sizes sa & sb respectively, and returns the k-th smallest element of their union. smallest element is referenced with k = 1 returns -1 on error */ int select(int *a, int *b, int sa, int sb, int k) { int ma = sa < k/2 ? sa - 1 : k/2 - 1; int mb = k - ma - 2; if (sa + sb < k) return -1; if (sa == 0) return b[k - 1]; if (sb == 0) return a[k - 1]; if (k == 1) return a[0] < b[0] ? a[0] : b[0]; if (a[ma] == b[mb]) return a[ma]; if (a[ma] < b[mb]) return select(a + ma + 1, b, sa - ma - 1, mb + 1, k - ma - 1); return select(a, b + mb + 1, ma + 1, sb - mb - 1, k - mb - 1); } /* median: uses select to find the median of the union of a and b (where a and b are sorted positive integer arrays of sizes sa and sb respectively. */ int median(int *a, int *b, int sa, int sb) { int m1, m2; if ((sa + sb) % 2 == 1) return select(a, b, sa, sb, (sa + sb)/2 + 1); m1 = select(a, b, sa, sb, (sa + sb)/2); m2 = select(a, b, sa, sb, (sa + sb)/2 + 1); return (m1 + m2) / 2; } int main() { int a[3] = {2, 4, 6}; int b[11] = {1, 3, 5, 7, 13, 17, 22, 23, 24, 25, 31}; printf("\n median is %d\n", median(a, b, 3, 11)); return 0; } |
Related posts:
Leave a Reply
About Me
Tags
Categories
- Algorithms
- Bash
- BlackBerry
- Collaboration
- Command Line
- Cool Tricks
- Easter Eggs
- Ebooks
- Firefox
- Hardware
- Humor
- iPhone
- Linux
- Linux Development
- Linux Kernel
- Networks
- Open Knowledge
- Other
- Productivity
- Programming
- Regular Expressions
- Science
- Security
- Shell Scripts
- Short Posts
- Social Networks
- Thoughts
- Tools
- Vim
- Web Development
- Websites
Popular Posts
Calendar
Archives
- July 2010 (5)
- June 2010 (1)
- May 2010 (1)
- April 2010 (3)
- March 2010 (1)
- January 2010 (1)
- December 2009 (2)
- September 2009 (13)
- July 2009 (1)
- June 2009 (6)
- May 2009 (4)
- March 2009 (18)
- February 2009 (10)
- January 2009 (10)
- December 2008 (7)
- November 2008 (8)
- October 2008 (1)
- August 2008 (1)
- July 2008 (1)
- June 2008 (2)
