Finding The Median Of Two Sorted Arrays

This is one of the beautiful cases where solving the general case first, and then applying it to a particular case is simpler and smoother than solving the particular case at once.

The problem:

Given two sorted arrays, “a” and “b”, with sizes “sa” and “sb” respectively, find the median of the union of these arrays.

Complexity requirements: O(log(sa + sb)) in the worst case, both time and space.

There are a few solutions for this problem, but non of them is as intuitive as the solution of a more general problem: The Select Problem:

Given two sorted arrays, “a” and “b”, with sizes “sa” and “sb” respectively, find the k-th smallest element of the union of these arrays. Under the same complexity requirements as above.

Once the select problem is solved, the median problem is solved as well, since the median is the ((sa + sb)/2 + 1)th smallest element (if sa+sb is odd), or the average of the two middle elements (if sa+sb is even).

So, let’s solve the select problem first:

The idea is to (virtually) partition “a” into two arrays: “a_left” and “a_right”, and the same for “b”: “b_left” and “b_right”, such that the size of “a_left” plus the size of “b_left” is exactly k.

Ideally, a_left will be a[0..(k/2-1)] and b_left will be b[0..(k/2-1)]. Unless, of course, a is smaller than k/2, in that case we’ll take “sa” elements from “a” (i.e. the whole array), and the rest of the elements (k – sa elements) from”b”.

Let’s assume that a_left has “na” elemetns, and b_left has “nb” elements. As we stated before, na+nb = k.

Trivial case 1:

k = 1. i.e., we need to find the smallest element of the union of “a” and “b”. Obviously, it’s either the first element of “a” or the first element of “b”. This operation is cheap. O(1)

Trivial case 2:

a[na-1] = b[nb-1]

In this case, the k-th smallest element is a[na-1] (or b[nb-1]): We have exactly (na + nb = k) elements smaller than it.

Case 3:

a[na-1] < b[nb-1]

This is the interesting case. If a[na-1] is smaller than b[nb-1], then:

1. The k-th smallest element cannot be in the “a_left” array:

All elements in “a_right” are greater than any of those in “a_left” (“a” is sorted). Total: sa – na elements

All elements in “b_right” are greater than any of those in “a_left” (“b” is sorted and a[na-1] < b[nb-1]). Total: sb – nb elements

At least one element in “b_left” is greater than any of those in “a_left” (we already know that b[nb-1] is such an element). Total: 1 (at least)

So, for any element in “a_left”, there are at least (sa – na + sb – nb + 1 = sa + sb – k + 1) elements which are greater than it.

The k-th smallest element must have exactly (sa + sb – k) elements greater than it. Hence, none of the elements in “a_left” can be the k-th smallest element.

2. What we do know about the elements in “a_left”, is that all of them are among the k smallest elements (but none is the k-th smallest) – Using the same logics above.

3. None of the elements in “b_right” are the k-th smallest element. We already know that all the elements in “a_left” and all the elements in “b_left” (total: k elements) are smaller than any element in “b_right”

From the three observation above, we can state that the k-th smallest element of the union of “a” and “b”, is the (k – na)th smallest element of the union of “a_right” and “b_left”: We excluded “a_left” in (1), excluded “b_right” in (3), and we already have “na” of the k smallest elements, so we need to find the (k-na)th smallest in the arrays not excluded.

The last case, where a[na-1] > b[nb-1] is similar to the previous case, except that “a” and “b” have exchanged there roles.

Here is the implementation in C of the above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <stdio.h>
/*
select:
gets two sorted positive integer arrays, a & b, with sizes sa & sb respectively,
and returns the k-th smallest element of their union.
smallest element is referenced with k = 1
returns -1 on error
*/

int select(int *a, int *b, int sa, int sb, int k) {
    int ma = sa < k/2 ? sa - 1 : k/2 - 1;
    int mb = k - ma - 2;
    if (sa + sb < k)
        return -1;
    if (sa == 0)
        return b[k - 1];
    if (sb == 0)
        return a[k - 1];
    if (k == 1)
        return a[0] < b[0] ? a[0] : b[0];
    if (a[ma] == b[mb])
        return a[ma];
    if (a[ma] < b[mb])
        return select(a + ma + 1, b, sa - ma - 1, mb + 1, k - ma - 1);
    return select(a, b + mb + 1, ma + 1, sb - mb - 1, k - mb - 1);
}

/*
median:
uses select to find the median of the union of a and b (where a and b are sorted
positive integer arrays of sizes sa and sb respectively.
*/

int median(int *a, int *b, int sa, int sb) {
    int m1, m2;
    if ((sa + sb) % 2 == 1)
        return select(a, b, sa, sb, (sa + sb)/2 + 1);
    m1 = select(a, b, sa, sb, (sa + sb)/2);
    m2 = select(a, b, sa, sb, (sa + sb)/2 + 1);
    return (m1 + m2) / 2;
}

int main() {
    int a[3] = {2, 4, 6};
    int b[11] = {1, 3, 5, 7, 13, 17, 22, 23, 24, 25, 31};
    printf("\n median is %d\n", median(a, b, 3, 11));
    return 0;
}
If you enjoyed this post, make sure you subscribe to my RSS feed!

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit Post to StumbleUpon

Related posts:

  1. Regular Expression To Check For Prime Numbers

Tags: , ,

Saturday, March 6th, 2010 Algorithms

Leave a Reply

my email
my photo
Hi,
My name is Amir Watad. I have a BSc. in biomedical engineering from The Biomedical Engineering school , Technion , Israel, and a BSc. in electrical engineering from The Electrical Engineering school , Technion , Israel.
I'm a Verification Engineer in Mellanox Technologies Ltd.
I love Linux, the Command Line and the OpenSource Community.
I used to write Poems (Arabic) when I was able to find time for this.
March 2010
S M T W T F S
« Jan   Apr »
 123456
78910111213
14151617181920
21222324252627
28293031  
SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.