# Blog

## Find Minimum Cost Path in a 2D Matrix: DP

Problem statement is : Given a 2D matrix, Cost[][], where Cost[i][j] represent the cost of visit cell Cost[i][j], find minimum cost to reach a given cell Cost[n][m], where a cell can be reach from it’s left (by moving one step right) or from top (by moving one step down).

# Finding Minimum Cost Path in a 2D Matrix

This problem is very similar to what we saw in Finding possible paths in grid. As mentioned there, grid problem reduces to smaller subproblems one we make a choice at a cell. In this problem, to reach a cell (i,j), we can either come from cell ( i-1, j ) from the top, or from cell (i, j-1) from left, whichever is minimum.

```CostToMove(i,j) = Min(CostToMove(i-1,j), CostToMove (i, j-1))
+ Cost(i,j)```

This equation gives hint at recursion. What should be terminating condition for recursion. It’s obvious, when we reach desired cell.

```findCost(i,j, cost) = cost(i,j) + Min( findCost(i-1,j, cost),
findCost(i,j-1, cost))```

Recursive implementation of recursion equation to find minimum cost to reach a given cell in matrix.

```#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
if(a > b) return b;
return a;
}

int findMinimumCostPath(int Cost[3][3], int i, int j){

if( i == 0 && j == 0 )
return Cost[0][0];

if( i == 0 )
return findMinimumCostPath(Cost,i, j-1)
+ Cost[i][j];
if( j == 0)
return findMinimumCostPath(Cost,i-1, j)
+ Cost[i][j];

return min(findMinimumCostPath(Cost,i-1, j),
findMinimumCostPath(Cost,i, j-1)
+ Cost[i][j]);
}
int main()
{
int M,N;

M = N = 3;
int Cost[3][3] = {
1,3,4,
5,3,2,
3,4,5
};
printf("Minimum cost of path : %d" ,
findMinimumCostPath(Cost, M-1, N-1));

}
```

Complexity of recursive method is exponential.

Notice that this problem satisfies two basic conditions to apply dynamic programming.

Optimal Sub-structure:- Optimal solution to bigger problem involves optimal solutions to subproblems.

Overlapping Subproblems:- Subproblems once computed can be stored in a table for further use. This saves the time needed to compute the same sub-problems again and again.

To find more about these properties, please refer : Dynamic programming basics.

Solution to this problem depends on solution to subproblems and we can see that there are many subproblems which are calculated again and again. So, this problem is ideal candidate for applying dynamic programming.

Let’s create a two dimensional matrix and fill it up bottom up. Let’s consider the top most row in matrix. Any cell in top most row can be reach only from left.

`MinCost(0,j) = MinCost(0,j-1) + Cost[0][j]`

Similarly, cell in leftmost column can only be reached from top.

`MinCost(i,0) = MinCost(i-1,0) + Cost[i][0]`

For all other cells,

`MinCost(i,j) = Min( MinCost(i-1),j), MinCost(i, j-1)) + Cost[i][j]`

Implementation of above equation is below:

```#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
if(a > b) return b;
return a;
}

int findMinimumCostPath(int Cost[3][3], int M, int N){

int MinCost[M][N]; //declare the minCost matrix

MinCost[0][0] = Cost[0][0];

// initialize first row of MinCost matrix
for (int i=1; i<N; i++){
MinCost[0][i] = MinCost[0][i-1] + Cost[0][i];
}

for (int i=1; i<M; i++){
MinCost[i][0] = MinCost[i-1][0] + Cost[i][0];
}

for (int i=1;i<M; i++){
for (int j=1; j<N; j++){
MinCost[i][j] = min(MinCost[i-1][j],
MinCost[i][j-1]) + Cost[i][j];
}
}

return MinCost[M-1][N-1];

}
int main()
{
int M,N;
M = N = 3;
int Cost[3][3] = {
1,3,4,
5,3,2,
3,4,5
};
printf("Minimum cost of path : %d" ,
findMinimumCostPath(Cost, M, N));

}
```

Complexity of dynamic programming approach to find minimum cost path in grid is O(n2) with additional space complexity of O(n2).

There is extension to the problem which is to find actual path that lead to the destination. Solution is simple, start from the destination cell, as that will be part of path anyways, start moving either to cell to left or top whichever is minimum till you reach origin cell.

There is one more variant of this problem, when we can move from left to right, top to down and diagonally as well. Nothing changes in solution except that we need to take minimum of three cells (left, top and diagonal).

Please share if there is something wrong or missing. If you want to contribute, please write to us

# Counting Boolean Parenthesis

Let’s say there is a boolean expression, a string which contains True or False as operands and between each pair of operand we have boolean operator (and, or and xor).  An example of a boolean expression is T and F or T xor F. Ask of boolean parenthesization problem is to find number of ways in which this boolean expression can be parenthesized so that expression evaluates to True. For example :

```Expression : T ^ F & T
Two ways : ((T ^ F) & T) and (T ^ (F & T))

T | T & F ^ T
Four ways : ((T|T)&(F^T)), (T|(T&(F^T))), (((T|T)&F)^T)
and (T|((T&F)^T))```

Let’s understand the concept first for solution and then we will see how can we implement it. If we start with an expression with only one boolean value T or F, how many way we can parenthesized it? Well, there are two answers to it. For T, there is one way, (T), whereas for F, there no way we can parenthesize to evaluates True. First take away for this insight is that we got the base case for recursion, if our solution is recursive. Second take away is that there can be two different outputs an expression or subexpression can evaluates and we have to store them both.

if T(i,j) is number of ways expression from i to j can be parenthesized so that it evaluates to True. Similarly, F(i,j) is number of ways expression evaluates to False. From above expression we know that :

```T(i,i) = 1 if operand is T
0 if operand is F

F(i,i) = 0 if operand is T
1 if operand is F```

How to calculate T(i,j)?  Let’s try to split this problem into smaller subproblems and reach at base case. If take an index k such that i<k<j, we have two smaller subproblems to solve now, T(i,k) and T(k+1,j). How can solution to these subproblems can be combined to get solution to bigger problem T(i,j). This is tricky part and depends on operator (and, or, xor) after k operand.

If there is “and” operand between two subexpressions (i,k) and (k+1,j) then how can expression (i,j) be True? Yes, only if expression (i,k) and expression (k+1,j) are True. For “and” operator, T(i,j) is:

`T(i,j)  = Summation ( T(i,k) * T(k+1,j)) for all k such that i < k < j`

How about expression (i,j) being evaluates to False? Simple enough, one of the two expression should evaluate to False.

`F(i,j) = Sum ( Total (i,j) - T(i,k)* T(k+1)) for all k for i< k< j`

What is Total(i,j) here?

```Total(i,j) =   Total(i,k) *  Total( k+1, j) )
Total(i,k) =   T(i,k) + F(i,k)
Total(k+1,j) = T(k+1,j) + F(k+1,j)

In short, Total(i,j) = T(i,j) + F(i,j)```

When operand between (i,k) and (k+1, j) is or:

`T(i,j) = sum ( Total(i,j) - F(i,k)* F(k+1,j)) for k such i<k`

In the same vein, T(i,j) and F(i,j) when operand is xor will be

`T(i,j) = sum(T(i,k)*F(k+1,j) + F(i,k)* T(k+1,j)) for k such i<k`

To find solution to problem all we need to find is T(1,n). Now, let’s see how to implement the approach.

## Dynamic programming implementation for boolean parenthesization problem

```#include<stdio.h>
#include<string.h>

int countNumberOfWaysToParenthesize(char operands[], char operators[], int n)
{
int F[n][n], T[n][n];

for (int i=0; i<n; i++){
F[i][i] = (operands[i] == 'F')? 1: 0;
T[i][i] = (operands[i] == 'T')? 1: 0;
}

for (int L=1; L<n; L++) {
for (int i=0, j=L; j<n; ++i, ++j){
T[i][j] = F[i][j] = 0;
for (int count=0; count<L; count++){
int k = i + count;
int totalIK = T[i][k] + F[i][k];
int totalKJ = T[k+1][j] + F[k+1][j];
if (operators[k] == '&') {
T[i][j] += T[i][k]*T[k+1][j];
F[i][j] += (totalIK *totalKJ - T[i][k]*T[k+1][j]);
}
if (operators[k] == '|'){
F[i][j] += F[i][k]*F[k+1][j];
T[i][j] += (totalIK*totalKJ - F[i][k]*F[k+1][j]);
}
if (operators[k] == '^'){
T[i][j] += F[i][k]*T[k+1][j] + T[i][k]*F[k+1][j];
F[i][j] += T[i][k]*T[k+1][j] + F[i][k]*F[k+1][j];
}
}
}
}
return T[0][n-1];
}

// Driver program to test above function
int main()
{
char operands[] = "TTFT";
char operators[] = "|&^";
int n = strlen(operands);

printf ("\n Number of ways to parenthisize expression : %d",
countNumberOfWaysToParenthesize(operands, operators, n));
return 0;
}

```

We will have two strings, one string operands[] represents all operands (T or F) and other string operations[] representing all operator (‘&’, ‘|’ , ‘^’).  Operands[i] is inserted at operands[i+1] to generates the expression. Below is the code which uses this definition and provides implementation of method discussed earlier.

Complexity of this approach to find ways to parenthesize a boolean expression so that it evaluates to True is O(n3). There is also O(n2) space complexity.

Please share if there is something missing or wrong. If you want to contribute to algorithms and me, please contact us.

# Matrix chain multiplication using dynamic programming

What is matrix multiplication? To read on that please refer to Wiki.
However, today’s problem is not about actually multiplying matrices, but to find out an order in which we multiply matrices so that number of multiplication done are minimum.

There are two basic properties of a matrix : number of rows and number of columns in matrix. Row and columns are called as dimensions of matrix.
Given N matrices of (MxN dimensions). Find an order in which these matrices are multiplied, so that total number of scalar multiplications are least. and dimensions of each matrix given in an array P where P[i-1] and P[i] denote rows and column respectively of ith matrix.

Matrix chain multiplication is a typical problem which is used to explain dynamic programming. Approach learnt here can be easily applied to many other problems.

Before going further, lets understand some points about matrix multiplication

```1. Matrix multiplication is associative i.e.  A* (B*C) = (A*B) *C
2. It is not commutative i.e  A * (B*C) not equal to A * (C * B)
3. To multiply two matrices, they should be compatible i.e. no of columns in first matrix should be equal to number of rows of second matrix.
No of columns of first matrix = No. of rows of second matrix```

Let’s get back to problem at hand.  Look at this problem as this : Figure out a way to put parenthesis around matrices so that total number of multiplication are least.

Brute force way is to find out every possible combination of parenthesis and check which one is minimum.
This is approach can be implemented in recursive way, once we put a parenthesis, problem is reduced putting parenthesis around N-1 matrices to be multiplied. However, code will be exponential in time and hence of no use for every large input.

Let’s see if dynamic programming fits in. We can see that cost of multiplying matrices Ai to Aj  is cost of

`Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])`

Idea is to find out K such that cost(Ai, Aj) becomes minimum. If M[i,j] represents the cost to multiply matrix i to matrix j, then,

`M[i,j]  = M[i,k] + M[K+1,j] + ((P[i-1] * P[k] * P[j])`

When calculating M[i,j]; M[i,k] and M[k+1,j] should be already available. Also, M[i,i] = 0 as cost of multiplying a single matrix will be 0.

Looking at the requirement to calculate M[i,j], it depends on length of chain. Length of chain from matrix i to matrix j would be i+j-1. We start with length L = 2 and go on to solve the table entry for length N. M[1, N] will give us the final cost.

We will start with L =2, chain we are considering is of length of 2 as j =  i + L -1  = 1 +2 -1 =2.  We will start with i =1 and loop runs to i = N-L +1 and considering two matrices at time.

Length being considered will vary for L =2 to L = N. Based on L, j will also change and we will be finding minimum cost(i,j) for each i and j. To find minimum cost(i,j), we need to find a K such that expression

`Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])`

becomes minimum. This is top down filling of table where M [1, 2] will be filled first and then going up to fill till M [1,N].

## Implementation for multiplication of matrix problem

```#include<stdlib.h>
#include<stdio.h>

#define MAX_INT 10000000

int matrixMultiplication(int p[], int N){

int L,i, j, temp;

int M[N][N];

for(i=0; i<N; i++){
for(j=0; j<N; j++){
M[i][j] = 0;
}
}

for(L=2; L<N; L++){
/* For every position i, we check every chain of len L */
for(i=1; i<N-L+1; i++){

j = i+L-1;
M[i][j] = MAX_INT;
/* For matrix i to j, check every split K */
for(int k=i; k<j; k++){
temp = M[i][k] + M[k+1][j] + p[i-1] * p[k] * p[j];
/* Check if the current count is less than minimum */
if(temp < M[i][j]){
M[i][j] = temp;
}
}
}
}
for(i=1; i<N; i++){
for(int k=1; k<N; k++){
printf("%d ", M[i][k]);
}
printf("\n");
}

return M[1][N-1];
}
/* Driver program to run above code */
int main(){

int p [] ={10, 20, 30, 40, 30};
int n = sizeof(p)/sizeof(p[0]);

printf("%d\n", matrixMultiplication(p,n));

return 0;
}

```

Let’s run through an example and understand how does this code work?

```P = {10, 20,30,40,30},
dimensions of matrix [1] = 10X20,
dimensions of matrix [2] = 20X30,
dimensions of matrix [3] = 30X40,
dimensions of matrix [4] = 40X30```

Function will be called with array P and size which is 5.
Table of N size is declared. M[6][6] because we will need to store M[1][4].

Diagonal of the table is set to zero, as cost of multiplication of single matrix is 0.

`M[0][0] = M[1][1] =M[2][2]=M[3][3]=M[4][4]= M[5][5]=M[6][6] =0`

Start with for loop with L =2. Inner for loop will run from i =1  to N-L+1 = 5-2+1 =4, j will be i +L -1. which is 2.
Inner most loop runs with K=1 (i) till k = 2 (j). M[1,2] is INT_MAX.

`temp = M[1,1] + M[2,2] + P[0] *P[1]*P[2] = 6000.`

Since it is less than INT_MAX M[1,2] = 6000.
Similarly, for i =2, j = 2 + 2 -1 =3. Above process is followed and M[2,3] = 24000 and so on.

`M[3,4] = P[2] * P[3] * P[4] = 36000.`

Coming to L =3. i =1, j = 4-1 =3. K =1 to K <3.
For K =1,

`temp =  M[1,1] + M[2,3] + P[0] * [1] * P[3] = 24000 +8000 = 32000`

For K=2,

`temp = M[1,2] + M[3,3] + P[0]*[2]*P[3] = 6000 + 12000 = 18000`

Hence M[1,3] = min(32000, and 18000) = 18000 with i = 1.

With i =2. j =2+3-1 =4. K =2 to K <4
For K =2,

`temp = M[2,2] + M[3,4] + P[1] * P[2]*P[4] =  36000 + 18000 = 54000.`

For K =3,

`temp = M[2,3] + M[4,4] + P[1] * P[3]*P[4] =  32000 + 24000 =  66000.`

Hence M[1,3] remains 18000.
Same process is followed and the entries are filled. Finally, matrix will look like this :

```0 6000 18000 30000
0 0 24000 48000
0 0 0 36000
0 0 0 0```

M[1,N-1] will be solution to our problem.

Time complexity of matrix chain multiplication using dynamic programming is O(n2). Also space complexity is O(n2).
Reference
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Dynamic/chainMatrixMult.htm

If you want to contribute to website, please contact us. Please share if you there is something wrong or missing.

## Longest common substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

# Longest common substring using dynamic programming

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Let’s create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j].

As in case of longest common subsequence, we will start with smaller case first. What if one of the string is empty?  If one of the string is empty then, LCS = 0. Hence, LCS[i][0] = 0 and LCS[0][j] = 0.

How to fill LCS[i][j]?

```1. Check if A[i] is equal to B[j]
1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring,
if any, till A[0...i-1] and B[0,,j-1])
1.2 if both characters are not same, LCS[i][j] = 0,
( Because if characters are not same, there cannot be any
common substring including A[i] and B[j].
2. Traverse the matrix and find the maximum element in it, that will be the length of Longest Common Substring.
(This step can be optimized by keeping track of max length while calculating LCS[i][j]).```

Implementation

```#include <stdio.h>
#include <string.h>

int max(int a, int b){
return a>b ? a:b;
}
int longestCommonSubstring(char * A, char * B){
int lenA = strlen(A);
int lenB = strlen(B);

int LCS[lenA+1][lenB+1];

for (int i=0; i <= lenA; i++){
LCS[i][0] = 0;
}

for (int j=0; j <= lenB; j++){
LCS[0][j] = 0;
}

int maxLength = 0;

for (int i=1; i<= lenA; i++){
for (int j=1; j <= lenB; j++){
if (A[i] == B[j]){
LCS[i][j] = 1 + LCS[i-1][j-1];
maxLength = max( maxLength, LCS[i][j] );
}
else {
LCS[i][j] = 0;
}
}
}
return maxLength;
}

int main(void) {
char *a = "ABCDEFGSE";
char *b = "EBCDEFGV";

printf("\n Longest common substring : %d",
longestCommonSubstring(a,b));

return 0;
}

```

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.

# Apache Spark

Before understanding what is Apache Spark and how it helps in data pipelines, let’s understand history behind big data and map reduce. Map reduce introduced ability to process vast amount of data on commodity hardware. Map reduce implementations inherently had fault tolerance, locality aware scheduling and load balancing. Programming models supported acyclic flow of data through data models. These are good for most kind of applications, however there are two big class of applications which were not completely and satisfactorily solved by existing programming models and implementations of Map Reduce. Those application classes are :

• Iterative applications

Application where output of a stage is fed as input to one of previous stage to refine things. Machine learning application are usually of this nature where same datasets are processed again and again to refine a feature. Available map reduce implementation Hadoop was inefficient in handling this use case because in Hadoop communication between two stages of pipeline happens through HDFS file system, that is disk. A stage has to process data, and then put data on to disk where next stage can pick up again. This leads to delays in processing and there is no optimization to support iterative applications.

• Interactive applications

This are applications which are ETL in nature with requirement of a console where data is to be presented. User can change input and then ask for different set of data based on input. Again, in Hadoop, each such request will be treated independently and every request will fetch data from disk. This leads to delays and application does not remain interactive anymore.

To solve challenges of these applications, different tools were introduced, like Apache Storm for stream processing or Hive and Pig for interactive usage of Hadoop, but none of these tools were solving both the problem at same time. Also, they lacked the abstraction so that they can be used for any general purpose.

How does Apache Spark solves challenges of these types of applications?

## Apache Spark : In memory data storage and processing

From above analysis, we understood that the basic problem with Hadoop implementation of Map reduce is disk access. If data which is read from disk can be read from memory, system becomes efficient. Apache Spark does exactly that. All of the data is stored in memory as abstract entities called Resilient Distributed Datasets (RDDs). More on that later.

Two questions come to mind when we say data is in memory: First, how fault tolerance works as data may be lost if a node goes down? Second, what if data is more than memory available at node? Fault tolerance is solved by maintaining meta data ( Step that led to creation of this RDD) about how RDDs on that nodes are created and if node goes down, it can re-run the steps which led to RDDs at this node and get whole data. More how RDDs can be created and used in Spark later.

Second, if data is more than memory available, it is written on disk, this affects the performance, but it is edge case and not a normal working condition.

This was the brief introduction to Apache Spark. We will continue on this thread in coming post. Stay tuned.

If you see something is missing or wrong, please share and we will fix it.

# Consistent Hashing

To understand consistent hashing, first of all we have to understand traditional hashing and it’s limitations in large scale distributed system.

Hashing in plain terms is nothing but a key value pair store, where given a key, associated value can be found very efficiently. Example : Let’s say we want to find a name of a street in city given it’s zip code. Idea is to store this information as hash as <Zip code, Street Name>.

Problem becomes more interesting when data is too big to store on one node or machine, multiple such nodes or machines are required in system to store it. For example, a system which uses number of web caches. First question: How to decide which key goes on which node? Simplest solution is use modulo function to decide. Given a key, take hash of key,  divide it by number of nodes in system and then put that key on to that node. Similarly, when fetching key, hash the key, divide with number of nodes and then go to that node and fetch value. Picture depicts conventional hashing in multi-node system.

Failures are common in commodity hardware driven distributed multi-node system. Any node can die without any prior notice and expectation is that system runs unaffected with slight cost on performance. What happens when in system described above, a node goes down?  In traditional hashing, total number of nodes have decreased, so function determining node to put key on or fetch key from, changes. New keys will be working fine. What happens to existing keys? They are all in wrong nodes as per new function.

To fix this problem, we have to redistribute all existing keys on remaining nodes, which may be very costly operation and can have detrimental effects on running system. Again what happens when node comes back? well, repeat what was done when node went down. This may result in thrashing effect where if a node misbehaves regularly, system would do no good work except from re-distribution of keys.

How to solve this challenge? This is where consistent hashing comes into picture.

## Consistent hashing in distributed multi-node systems

Consistent hashing comes up with a very simple idea, that is to have nodes and keys in same id space unlike traditional hashing where node id and keys were in two different id space. Node id can be hash function to IP address and then same hash function is applied to keys to determine which node key goes on or to fetch from.

Critical requirement for consistent hashing implementation is to have a hash function which is consistent irrespective of system view and map keys roughly uniformly on all machines.

Chose any base hash function such that it maps a key space to integers in range [0..M]. Once we divide it with M, it gives us an unit circle. Now, each key once hashed represents a point on this unit circle.

How does key maps to a node exactly? Well, key is hashed and then put key on to first node you find while moving clockwise. Simple enough, huh? To find a key, take hash and go to first node while moving clockwise on to unit circle.

How does it solve the problem of scale? Let’s my system is receiving  5 x load, what happens to nodes and how can I balance load or reduce it? Simple thing is to add more nodes uniformly distributed on unit circle and problem solved. Consistent hashing built of scale.

What happens when a node goes down? All the keys which were on this node are reallocated to next successor node on circle. All other keys remain unchanged. This is far more optimal compared to case when we have re-distribute all keys on failure of one node.

As mentioned earlier, we assume that hash function used will distribute keys on nodes uniformly, which is not realistic. To reduce non-uniformity,  virtual nodes are introduced. In this case, each nodes is hashed with K different hash function which maps nodes on different points on circle. Still node is going to get 1/N keys however, virtual nodes reduces key load variance significantly.

One challenge still remains : How to efficiently find successor node for a given key, we want to find source s such that h(s) > h(k) of key k. Intuitive solution is to use hash, but hashes do not maintain any ordering information. Best bet is to use binary search tree which maintain ordering, but the successor function is proportional to depth of tree which is O(N), We can reduce that by using balance binary search trees like red black trees which reduces complexity to log(N).

### Where all consistent hashing is used?

Consistent hashing is used in Memcached, Casandra Amazon Dynamo etc.

If you find this article useful, please share. If there is something missing or wrong in article please share and we will correct it.

Reference:

http://theory.stanford.edu/~tim/s16/l/l1.pdf

http://www8.org/w8-papers/2a-webserver/caching/paper2.html

# Merge Sort Algorithm

We can classify sorting algorithms based on their complexity, selection sort, bubble and insertion sort have complexity of O(n2) while heap sort, merge sort and quicksort (with some assumptions) have complexity of  O(nlogn) and count and Radix sorts are linear O(n) algorithms.
In this post I will be concentrating on one sorting algorithm that is merge sort.
Merge sort falls in divide and conquer class of algorithm where input space is divided into subproblems and these subproblems are then solved individually and combined together to give solution to the original problem. Below figure explains divide and conquer paradigm.
In merge sort, input space is divided into two subproblems till the time we achieve smallest sub-problem and then the smallest sub-problem is solved, that is sorted and then combined up till the point where all of the original input is sorted. Question arises is what is the smallest sub-problem?  Smallest sub-problem is the condition where input cannot be further divided. In case of an array of integers, this will be met when there is only one element in the array.
From the above explanation, it is clear that sub-problems are subjected to same processing which is done on original input. That rings bell for recursion. And the base condition to stop recursion is derived in above paragraph, which is there is only one element in array.
Now, how do we combine two sub-problems solutions? This step is conquer part. Sort the smallest parts and combine them together with merge operation. In merge operation, we scan both sorted arrays and based on the comparison, put one of the two items into output array, till both arrays are scanned. If one array is finished before other, just copy all elements from the other array to output array. Copy this output array back to original input array so that it can be combined with bigger sub problems till solution to original problem is derived.
If don’t want to read, watch merge sort video

Below figure shows the merge sort operation.
Let’s take and example and work out merge sort and then we will implement it.
Divide step
Conquer step
Merge sort implementation
Code explanation
For mergesort function we get three parameters, the input array a[], the start index and the end index of array. This start and end change in every recursive invocation of mergesort function.
We find the middle index using start and end index of the input array and again invoke the same function with two parts one starting from start to mid and other being from mid+1 to end.
Once base condition is hit, we start winding up and call merge function. Merge function takes four parameters, input array, start, end and middle index based on start and end.
Merge function merges two sub arrays (one from start to mid and other from mid+1 to end) into a single array from start to end. This array is then returned to upper calling function which then again sort two parts of array.
Complexity analysis
As we know that every time input space is divided into two and from the binary search algorithm we know that this division will have complexity of O(log n) where n is input size. So the first part of implementation of merge sort has complexity of O(logn). Now the second part of implements merge step which place every elements in its proper place in array, hence it linear time O(n). Since above step of dividing has to be done for n elements, hence total complexity of merge sort will be O(nlogn).
However, there is one caveat, every time we merge, two subarrays an auxiliary array is needed to temporarily store array elements and that incurs O(n) space complexity on merge sort algorithm.

There are some improvements which can be done on this algorithm.
1. When number of elements are less than some threshold, one can use insertion or selection sort to sort those numbers.  Why? Because when n is small, O(n2) is as good as O(nlogn) and it saves extra overhead of split and merging. All in all, using insertion sort in input array with small size, can give better performance.

2. Before calling merging, check if all the elements in right array are greater then left array, if yes, no need of merging. This can be easily checked by comparing a[mid] with a[mid+1]. If a[mid] is less than a[mid+1],  two sub arrays are already sorted and we don’t need to perform merge.

## External Merge sort

Merge sort is best used when data is in huge in size as compared to available memory, like sorting 2GB of data when we have only 100 MB of RAM or physical memory. Data cannot fit in memory and resides on to disk from where it is fetched in chunks and merged.
There are two passes in the external merge sort : Sort pass and merge pass. below figure explains this

Sort pass

• Divide the input in N chunks where N  = Size of total data/Size of available memory
• Load each chunk in main memory and sort it with any conventional sorting algorithm.
• Now load a predefined block of the sorted chunks into memory again. keep a buffer to store sorted output.

Merge Pass

• Now have N-way merge and put output on to buffer. As buffer get full, write that onto disk.
• If any of the small chunk taken if exhausted, one can fill the next block from the same chunk.

External merge sort can be improved significantly using parallelism where data chunks are written on different disk where read and write can be performed in parallel. Also, different chunk can be sorted on different processors in parallel. If processors are not available, merge routine can take advantage of multithreading.
There is one problem which is classic example of usage of external merge sort is mentioned here in problem number 5 and solved here : External merge sort.

If you have any other optimization trick or better way to implement merge sort please comment below.