Find median of two sorted arrays of different sizes

Median of two sorted arrays of different sizes

Given two sorted arrays of unequal sizes, find median of union of two arrays. Same problem has been solved for equal size arrays here  : Find median of two sorted arrays and Median of two sorted array : Merge based method.

Median of two arrays of different sizes

As per the definition of median, median provides middle of the range of the elements, that is, equal number of elements on left side are smaller than median and equal number elements on right side are larger.

Brute force algorithm will be to merge two sorted arrays and take middle or average of middle and middle + 1 element and return, based on whether union of two array is of odd or even size. That requires O(m+n) time  merge will at least take m+n operation.

Can we optimize the algorithm or modify it work more efficiently? Do we need to completely sort elements in order to get middle of resultant array? Answer is No. Please follow this post to understand how we find middle of two sorted arrays of unequal length in O(log(m+n)) time : Find Kth smallest element in two sorted arrays

Idea here is to find middle and middle+1 element using above algorithm and tada! we are done. Just return the average of two.

find median of two sorted arrays: Algorithm

1. Find the middle of two sorted array, if they are combined together.
2. If length of one array is odd and other even, return middle element. 
3. Else, find middle +1 element in same arrays.
4. Get the average of (middle and middle + 1) and return as median.

Only doubt will be that, whenever we are discarding parts of arrays, they might not be equal, so the resultant median of these sub arrays may not be equal to median of original arrays.
So, we need to find the minimum of k/2 or the current size left when function is invoked, that’s why you will see i and j are found using this information.

#include<stdlib.h>
#include<stdio.h>
 
#define MIN(a, b) a < b ? a : b
 
int findKthElement(int a[], int b[], int sizeA, int sizeB, int k){
    /* to maintain uniformaty, we will assume that size_a is smaller than size_b
    else we will swap array in call 🙂 */
    if(sizeA > sizeB)
        return findKthElement(b, a, sizeB, sizeA, k);
        /* Now case when size of smaller array is 0 i.e there is no elemt in one array*/
        if(sizeA == 0)
           return b[k-1]; // due to zero based index
 
        /* case where K == 1 that means we have hit limit */
        if(k == 1)
            return MIN(a[0], b[0]);
 
        /* Now the divide and conquer part */
 
        int i =  MIN(sizeA, k/2) ; // K should be less than the size of array  
        int j =  MIN(sizeB, k/2) ; // K should be less than the size of array  
 
        if(a[i-1] > b[j-1])
            // Now we need to find only K-j th element
            return findKthElement(a, b+j, i, sizeB-j, k-j);
        else
            return findKthElement(a+i, b, sizeA-i, j,  k-i);
 
        return -1;
}
 
float  findMedian(int a[], int b[], int sizeA, int sizeB){
    int left  =  (sizeA + sizeB + 1) >>1;
    int right =  (sizeA + sizeB + 2) >>1;
 
    //if sum of two sizes is odd, return the middle element
    if((sizeA + sizeB) & 0x01){
    	return findKthElement(a,b, sizeA, sizeB, left);
    }
    else{
        return (float) ((findKthElement(a,b, sizeA, sizeB, left) +
                      findKthElement(a,b, sizeA, sizeB, right))/2.0);
    }
}
int main(){
 
    int a[] = {10,30,40,50,60,70};
    int b[] = {30,55,100,110,115,135};
 
    //  int c [] = {10,30,30,40,50,55,60,100,110,115,135}
 
    int size_a  = sizeof(a)/sizeof(a[0]);
    int size_b  = sizeof(b)/sizeof(b[0]);
 
    printf("\nMedian is : %f", findMedian(a,b,size_a, size_b));
    return 0;
}

Output

 a = {10,30,40,50,60,70};
 b = {30,55,100,110,115,135};

 Median is : 57.5000

Complexity of code to find median of two sorted array with above algorithm is O(log (m+n)).

There is another method which uses merge part of merge sort. All we need to is to start merging two sorted array into third one. While doing so, keep track how many element have been sorted. Before that to figure out if we have to return mid or average of mid and mid + 1. This depends on sizes of individual arrays. If both arrays have even or odd length, then total number of elements will be even and hence we will return average of mid and mid+1, else return mid element.

Once number of elements sorted in correct position in merge steps equal to mid + 1, either return c[mid] or (c[mid] + c[mid+1])/2, based on decision taken above. Below is implementation of this method.

#include <stdio.h>
 
int findMedian(int a[], int b[], int n, int m){
    int i = 0, j = 0;
    int k = 0;
    int c[n+1];
    int mid = (n+m)/2;
    int totalOdd = 0;
 
    //find if n and m are both odd or even. 
    //Then return average of mid and mid+1
    if((n%2 && !(m%2)) || (!(n%2) && m%2)) { 
        totalOdd = 1;
    }
 
    while(i<n && j<m){
        if(a[i] < b[j])
            c[k++] = a[i++];
        else
            c[k++] = b[j++];
 
        /* We have stored n+1 elements 0 to n
        Hence c[k-1] is n+1th element while 
        c[k-2] is nth element
        median = average of n and n+1 elements */
        if(k == mid+1){
            if(totalOdd){
                return c[k-1];
            }
            else{
                return (c[k-1] + c[k-2])/2;
            }
        }
    }
 
    while(i<n){
        c[k++] = a[i++];
        if(k == mid+1){
            if(totalOdd){
                return c[k-1];
            }
            else{
                return (c[k-1] + c[k-2])/2;		
            }
        }
    }
 
    while(j<m){
        c[k++] = b[j++];
        if(k == mid+1){
           if(totalOdd){
              return c[k-1];
            }
            else{
              return (c[k-1] + c[k-2])/2;		
            }
        }
    }
}
 
int main(void) {
    int a[] = {10,20,30,50,60};
    int b[] = {70,80,90,100,110,120};
 
	// c = [10,20,30,50,60,70,80,90,10,110,120]
    int n =  sizeof(a)/sizeof(a[0]);
    int m =  sizeof(b)/sizeof(b[0]);
    printf("Median of two arrays : %d", findMedian(a,b,n,m));
 
    return 0;
}

Only drawback is we are using O((n+m)/2) extra space. It can be handled as explained here : Median of two sorted array : Merge based method

Please comment if you find something wrong or you have another way of implementing it.