Median of Stream of numbers

Median of stream of numbers

Given a continuous stream of integers, find median of stream of numbers received so far. Median of array of integers is middle element of array if number of elements in array are odd and average of middle elements if number of elements is even.

Median of stream of numbers can be queried at multiple times at different point of time. Insertion and median functions can be called in any order. For example:

add(1)
add(2)
findMedian() -> 1.5 Median of stream of numbers till now
add(3) 
findMedian() -> 2 Median of stream of numbers till now

Since, this algorithm which finds median at a given point of time for a stream of integer is called online algorithm.

First solution be to store the stream received in an unsorted array. Finding median of stream of numbers will take O(N log N) (because we have to sort array first before returning median), time complexity for insertion will be O(1).

How about storing integer in a sorted array? Insertion will take O(N) as we need to shift elements once correct place is identified for new integer. Median can be found O(1).

Storing stream in linked list will have same complexity as of unsorted array.

Consider some advance data structure. Here, we need some kind of order of elements. We can divide integers in two groups:  all integers less than median, let it be group 1 and all elements  greater than median, call it group 2.
So greatest integer in group 1 (less than median) should be less than least integer in group 2. Now, problem reduces to find greatest integer in one group and smallest on other side. Which data structure gives that kind of information in constant time? Well, answer is heaps.

Maximum element in group 1 can be found using max heap, while minimum element in group 2 can be found using min heap in constant time. We require two heaps. Max heap and min heap. Max heap contains all integers less than current median and min heap contains integers greater than current median.

How to insert an new integer in heaps and what invariant they should always possess. Since, median is middle of all integers if number for elements are odd, we need to keep size difference of two heaps either equal  0 or  1. If size difference becomes more than 1, adjust elements in two heaps.

If size of max heap is 2 more than min heap, extract maximum element from max heap and put it in min heap.
If size of min heap is 2 more than max heap, extract minimum element from min heap and put it in max heap.

Now how to calculate median of stream of numbers? It’s simple again!

If number of elements in both heaps are equal, take average of max and minimum elements in heaps and return as median of stream of numbers till now. If sizes of two heaps are different, we select median as root of heap which contains more elements.

Median of stream of numbers implementation in C

#include <stdio.h>
#define MAX_SIZE 10
 
typedef struct heap {
    int size;
    int items[MAX_SIZE];
}Heap;
 
int parent (int i){
	return (i-1)/2;
}
 
int leftChild(int i){
	return (i*2)+1;
}
int rightChild(int i){
	return (2*i)+2;
}
int swap(int a[], int i, int j){
	int temp = a[i];
	a[i] = a[j];
	a[j] = temp;
}
 
void moveUp(int heap[], int size, short type){
    int current = size-1;
    int p = parent(size);
 
    if(type == 0){
    	while(p >=0 && heap[p] < heap[current]){
    	/*If we parent is greater than the current element, 
    	then swap parent with current */
    	    swap(heap,p, current);
    	// move up the tree.
    	    current = p;
    	    p = parent(current);
    	}
     }
     else{
 	while(p >=0 && heap[p] > heap[current]){
    	/*If we parent is greater than the current element, 
    	then swap parent with current */
    	    swap(heap,p, current);
    	    // move up the tree.
    	    current = p;
    	    p = parent(current);
    	}	
     }
}

void heapify(int a[], int i, int length){
    int largest = i;
    int left, right;
 
    left  = leftChild(i);
    right = rightChild(i);
 
    if(left < length && a[i] < a[left]){
        largest = left;
    }
    if(right < length && a[largest] < a[right]){
       largest = right;
    }
    if(largest != i){
        swap(a, i, largest);
        heapify(a, largest, length);
    }
}

void minHeapify(int a[], int i, int length){
	int smallest = i;
	int left, right;
 
	left  = leftChild(i);
	right = rightChild(i);
 
	if(left < length && a[i] > a[left]){
		smallest = left;
	}
	if(right < length && a[smallest] > a[right]){
		smallest = right;
	}
	if(smallest != i){
		swap(a, i, smallest);
		minHeapify(a, smallest, length);
	}
}

int delete(int heap[], int *size, short type){
 
    int currentSize = *size;
    int deletedValue = -1;
    
    if(currentSize < 1)
        return -1;
    else{
	deletedValue = heap[0];
	swap(heap, 0, currentSize-1);
	currentSize--;
	if(type == 0){
	    heapify(heap, 0, currentSize);
	}
	else{
	    minHeapify(heap, 0, currentSize);
	}
     }
     *size = currentSize;
     return deletedValue;
}
 
 
void insert(int heap[], int *size, int value, short type){
    heap[(*size)++] = value;
    moveUp(heap, *size, type);
}
 
void insertInMaxHeap(Heap *maxHeap, int val){
	insert(maxHeap->items,&(maxHeap->size),val, 0);
}
void insertInMinHeap(Heap *minHeap, int val){
	insert(minHeap->items,&(minHeap->size),val, 1);
}
 
int deleteFromMaxHeap(Heap *maxHeap){
	return delete(maxHeap->items,&(maxHeap->size), 0);
}
 
int deleteFromMinHeap(Heap *minHeap){
	return delete(minHeap->items,&(minHeap->size), 1);
}
 
/*
1. Add to max heap.
2. Check if max_heap.count- min_heap.count >=2 
   remove top of max heap and add it to min_heap
*/
void addNumber(Heap *minHeap, Heap *maxHeap, int value){
 
    // Insert in max heap 
    insertInMaxHeap(maxHeap,value);
 
    /* If difference between two heap size is more than 1
    or the maximum element in more than least element in min heap
    then remove from max heap and add to min heap*/
    if((maxHeap->size - minHeap->size) >1 ||
        (minHeap->size != 0 && maxHeap->items[0] > minHeap->items[0])){
 
        int top = maxHeap->items[0];
 
        deleteFromMaxHeap(maxHeap);
        insertInMinHeap(minHeap,top);
    }
    /* If difference between two heap size is more than 1 */
    if((minHeap->size !=0 && (minHeap->size - maxHeap->size) > 1)){
        int top = minHeap->items[0];
        deleteFromMinHeap(minHeap);
        insertInMaxHeap(maxHeap,top);
    }
}
 
void printMedian(Heap minHeap, Heap maxHeap){
 
    if(maxHeap.size == minHeap.size){
        printf("\n Median  %d", (maxHeap.items[0] + minHeap.items[0])/2 );
    }
    else{
        if(maxHeap.size > minHeap.size){
            printf("\n Median : %d ", maxHeap.items[0]);
        }
        else{
            printf("\n Median : %d ", minHeap.items[0]);
        }
    }
}
//Driver program
int main(){
 
    Heap maxHeap, minHeap;
    maxHeap.size = 0;
    minHeap.size = 0;
 
    for(int i=1; i<=20; i++){
    	if(i%4 == 0){
    		printMedian(minHeap, maxHeap);
    	}
    	addNumber(&minHeap,&maxHeap,i);
    }
    return 0;
}

Execution trace of program to find median of stream of numbers. Please take pen and paper and try to go through the flow.

============================================
 Processing element : 1
 Max heap now: 1 
============================================
 Processing element : 2
 Max heap now: 2 1 
Size of max heap greater than min heap
Removing max element 2 from max heap 
Adding max element 2 from max heap to min heap
 Min heap now :2 
============================================
 Processing element : 3
 Max heap now: 3 1 
Top of max heap is greater than top of min heap
Removing max element 3 from max heap 
Adding max element 3 from max heap to min heap
 Min heap now :2 3 
 Median : 2 
============================================
 Processing element : 4
 Max heap now: 4 1 
Top of max heap is greater than top of min heap
Removing max element 4 from max heap 
Adding max element 4 from max heap to min heap
 Min heap now :2 3 4 
Size of min heap greater than max heap
Removing min element 2 from min heap 
Adding min element 2 from min heap to max heap
 Max heap now: 2 1 
============================================
 Processing element : 5
 Max heap now: 5 2 1 
Top of max heap is greater than top of min heap
Removing max element 5 from max heap 
Adding max element 5 from max heap to min heap
 Min heap now :3 4 5 
============================================
 Processing element : 6
 Max heap now: 6 2 1 
Top of max heap is greater than top of min heap
Removing max element 6 from max heap 
Adding max element 6 from max heap to min heap
 Min heap now :3 4 5 6 
Size of min heap greater than max heap
Removing min element 3 from min heap 
Adding min element 3 from min heap to max heap
 Max heap now: 3 2 1 
============================================
 Processing element : 7
 Max heap now: 7 3 1 2 
Top of max heap is greater than top of min heap
Removing max element 7 from max heap 
Adding max element 7 from max heap to min heap
 Min heap now :4 6 5 7 
 Median : 4 
============================================
 Processing element : 8
 Max heap now: 8 3 1 2 
Top of max heap is greater than top of min heap
Removing max element 8 from max heap 
Adding max element 8 from max heap to min heap
 Min heap now :4 6 5 7 8 
Size of min heap greater than max heap
Removing min element 4 from min heap 
Adding min element 4 from min heap to max heap
 Max heap now: 4 3 1 2 
============================================
 Processing element : 9
 Max heap now: 9 3 4 2 1 
Top of max heap is greater than top of min heap
Removing max element 9 from max heap 
Adding max element 9 from max heap to min heap
 Min heap now :5 6 8 7 9 
============================================
 Processing element : 10
 Max heap now: 10 3 4 2 1 
Top of max heap is greater than top of min heap
Removing max element 10 from max heap 
Adding max element 10 from max heap to min heap
 Min heap now :5 6 8 7 9 10 
Size of min heap greater than max heap
Removing min element 5 from min heap 
Adding min element 5 from min heap to max heap
 Max heap now: 5 3 4 2 1 
============================================
 Processing element : 11
 Max heap now: 11 3 5 2 1 4 
Top of max heap is greater than top of min heap
Removing max element 11 from max heap 
Adding max element 11 from max heap to min heap
 Min heap now :6 7 8 10 9 11 
 Median : 6 

Complexity of finding median of stream of numbers is O(1) and insertion will be O(log N).

Please share if something is wrong or missing. Also, if you are interested to contribute to site, please refer to Publishing at Algorithms and Me and contact us. We would love to have you as contributor.

  • Holden

    Thank you for your great explanation. I have a doubt. How is this possible?
    “Array will be unsorted. Finding median will take O(N)”
    We need to sort the array first. without sorting how can we find median in O(n)?

    • rajeev

      probably a partition of the partition sort might be used? do it until you find the mid index (mid and next index in case of even) this would give median in O(n) time. provided we randomly select the pivot

      • http://algorithmsandme.com/ Jitendra Sangar

        Thanks Rajeev for explanation.

  • Holden

    There are 2 minor typos in last line:

    If size of min heap is 2 more than min heap, extract minimum element from max heap and put it in max heap. —>
    If size of min heap is 2 more than max heap, extract minimum element from min heap and put it in max heap.

  • Holden

    About this statement:

    “If sizes of two heaps different, then maximum element of max heap will be the median.”

    Is it true for for example this sample:
    10, 20, 30 – 40, 50, 60, 70
    Median should be 40, but based on the statement it is 30.
    I think it should be: “When the heaps do not have equal size, we select the median from the root of heap containing more elements.”

  • http://www.waytocrack.com kumar

    nice explanation : have a look at my code with very similar logic http://waytocrack.com/forum/index.php/368/median-in-a-stream-of-integers