Blog

Abstract class Vs Interface

Difference between abstract class and interface

To understand difference between abstract class and interface, first, we have to understand what is an abstract class and an interface and what are the similarities.

Abstract class in plain terms is a class which contains one more abstract methods. Abstract method has no implementation, just a declaration. Sub-classes of abstract class are responsible for providing implementation of such methods. Example of an abstract class as follows

package Examples;

/**
 * Created by sangar on 22.10.17.
 */
public abstract class AbstractBicycle {
    //member fields
    protected int gears;
    protected int topSpeed;
    protected int serviceCost;

    public int getGears(){
        return this.gears;
    }

    public abstract void service();
}

 

On the other hand, interfaces are like contracts which are honored by every class which implements them. It does not matter how a class implement the contract, as far as it implements it. For example, Television box can have may functionalities like changing channels, volume control, records, selection of source etc. TV manufacturers defined this industry standard interfaces, so that remotes can be build to call these behaviors on TV sets. It does not matter how different models of TV implement these interface, as far as they implement it.

Interface is again just like a class, which contains all abstract methods.  Implementation of those methods is left to class  which implements interface. Interface can contain constants too which are by default final. Example of declaration of an interface be

package Examples;

/**
 * Created by sangar on 22.10.17.
 */
interface IBicycle {

    void changeGear(int newValue);

    void speedUp(int increment);

    void applyBrakes(int decrement);
}

So, what’s the difference between abstract class and interface then? There are very subtle differences as follows

  1. Abstract class is object oriented in nature as it defines hierarchy between classes, on the other hand interfaces are functional in nature as they define what a class has to implement to satisfy this interface.
  2. Methods and fields in abstract class can have individual access specifiers like private, public or protected, where as in interfaces, everything has to be public. Sub-class implementing abstract methods from abstract class can keep visibility same or reduce it, whereas class implementing interface has to keep visibility same which is public.
  3. A sub-class or child can only extend single abstract class, whereas a class can implement multiple interfaces.
  4. By definition, abstract class can have default implementation of some methods, where as interface just has definition of all methods. This is not true from Java 8 default method signatures in interface. Refer here for more details.
  5. Sub-class uses extends to use abstract class where implements to implement an interface.
  6. Abstract classes can have constructors but interfaces can’t have constructors.

Abstract class Vs Interface : When to use what?

Now, when to use what? When to use abstract class and when to go to interfaces.

Consider using abstract classes if any of these statements apply to your situation:

  • You want to share implementation among several closely related classes.
  • You expect that classes that extend your abstract class have many common methods or fields, or require access modifiers other than public (such as protected and private).
  • You want to declare non-static or non-final fields. This enables you to define methods that can access and modify the state of the object to which they belong.

Consider using interfaces if any of these statements apply to your situation:

  •  If unrelated classes would implement your interface.
  • Specify the behavior of a particular data type, but not concerned about who  implements its behavior.
  • You want to take advantage of multiple inheritance of type.

Reference : Oracle documentation

In short, abstract class provides is-a kind of relationship between class and concrete sub-classes whereas interface provides has-a capability between unrelated classes.

This post clarifies difference between abstract class and interface and explains when to use what. Hope it helped you!

Please share if there is something wrong or missing. If you want to contribute to website, please reach out to us through contact form or drop a mail to jitsceait@gmail.com.

 

Object oriented programming : Classes

What is a class?

In object oriented programming parlance, a class is a blue print used to create different instances, called objects, of a domain entity. Classes encapsulate state and behavior of domain entity.  State is maintained using member variables and behavior using member functions.

For example, Bicycle is a class, it has three members : model name, top speed and number of gears. Based on this blueprint, we can instantiate objects like bike with 2 gears and 15 as top speed, another bike with 6 gears and 20 as top speed etc. Behaviors like change the top speed, get the number of gears are all encapsulate with in class and available in all instances of class Bicycle.

/**
 * Created by sangar on 22.10.17.
 *
 */
package Examples;
public class Bicycle {
    
    private int gears;
    private int topSpeed;
    
    public Bicycle(int gears, int topSpeed){
        this.gears = gears;
        this.topSpeed = topSpeed;
    }
    
    public int getGears(){
        return this.gears;
    }
    
    public void service(){
        System.out.println("Servicing for bike with grear"
           + this.gears + "costs $10");
    }
}

Are all classes used to create objects? Answer is no, as there are abstract classes and static classes, which cannot be instantiated as usual classes.

An abstract class contains one or more abstract methods. Abstract method does not have implementation, it only has declaration. An abstract class requires subclass to provide implementation of methods and instantiate it.

Continuing above example of bicycle, it would be nice if we can define different types of bicycles like city bikes, mountain bikes and racing bikes. All these bikes will have some common behavior like service, but the implementation of that behavior would be different for each one it. In this scenario, Bicycle becomes an abstract class and declares member function without implementation. City bike,  Mountain bike and Racing bike are sub-classes of bicycle abstract class and implement the service method.

package Examples;

/**
 * Created by sangar on 22.10.17.
 */
public abstract class AbstractBicycle {
    //member fields
    protected int gears;
    protected int topSpeed;
    protected int serviceCost;

    public int getGears(){
        return this.gears;
    }

    public abstract void service();
}
package Examples;

/**
 * Created by sangar on 22.10.17.
 */
public class MountainBike extends AbstractBicycle {

    public MountainBike(int gear,
                        int topSpeed, int serviceCost){
        this.gears = gears;
        this.topSpeed = topSpeed;
        this.serviceCost = serviceCost;
    }

    public void service() {
        System.out.println("Servicing Mountain Bike :");
        System.out.println("Oil the chain");
        System.out.println("Check tyre pressure");
        System.out.println("Oil brake");
    }
}
package Examples;

/**
 * Created by sangar on 22.10.17.
 */
public class CityBike extends AbstractBicycle {
    public CityBike(int gear,
                    int topSpeed, int serviceCost){
        this.gears = gears;
        this.topSpeed = topSpeed;
        this.serviceCost = serviceCost;
    }

    public void service() {
        System.out.println("Servicing City Bike :");
        System.out.println("Oil the chain");
        System.out.println("Fix the lock");
    }
}
package Examples;

/**
 * Created by sangar on 22.10.17.
 */
public class BicycleStore {

    public static void main(String[] args){
        MountainBike mountainBike = new MountainBike(3,15,7);
        mountainBike.service();

        CityBike cityBike = new CityBike(6,10,10);
        cityBike.service();
    }
}

On the other hand, static class is a class, which is never required to be instantiated. It contains member values which will not differ across instance and methods, which do not depend on external context.  In Java specifically, nested classes are static in nature.

What does a class contain?

As mentioned above, a class will contain member fields and functions. A class can also have one or more constructors.

Member fields can be public, private or protected. In short, public members can be access from outside of the class, using objects. Private members are not visible outside of class and can only be accessed by member functions of class itself. Protected ones have different rules when it comes to inheritance, but usually, accessible from within package, subclass and not accessible to outside world.

Access control specifier in Java

One can have setters and getters on fields of class. Setters set the value of fields and getters retrieve value of fields.

We will be discussing more advance things in following posts, so please stay tuned. Also, if you find anything wrong or missing, please let us know. We would be more than happy to correct things.

 

Find Minimum Cost Path in a 2D Matrix: DP

Problem statement is : Given a 2D matrix, Cost[][], where Cost[i][j] represent the cost of visit cell Cost[i][j], find minimum cost to reach a given cell Cost[n][m], where a cell can be reach from it’s left (by moving one step right) or from top (by moving one step down).

Finding Minimum Cost Path in a 2D Matrix

This problem is very similar to what we saw in Finding possible paths in grid. As mentioned there, grid problem reduces to smaller subproblems one we make a choice at a cell. In this problem, to reach a cell (i,j), we can either come from cell ( i-1, j ) from the top, or from cell (i, j-1) from left, whichever is minimum.

CostToMove(i,j) = Min(CostToMove(i-1,j), CostToMove (i, j-1)) 
                 + Cost(i,j)

This equation gives hint at recursion. What should be terminating condition for recursion. It’s obvious, when we reach desired cell.

findCost(i,j, cost) = cost(i,j) + Min( findCost(i-1,j, cost),
                                       findCost(i,j-1, cost))

Recursive implementation of recursion equation to find minimum cost to reach a given cell in matrix.

#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
    if(a > b) return b;
    return a;
}

int findMinimumCostPath(int Cost[3][3], int i, int j){
	
    if( i == 0 && j == 0 )
        return Cost[0][0];
	
    if( i == 0 )
	return findMinimumCostPath(Cost,i, j-1)
               + Cost[i][j];
    if( j == 0) 
    	return findMinimumCostPath(Cost,i-1, j)
               + Cost[i][j];
    	
    return min(findMinimumCostPath(Cost,i-1, j), 
    		   findMinimumCostPath(Cost,i, j-1)
                   + Cost[i][j]);
}
int main()
{
    int M,N; 
    
    M = N = 3; 
    int Cost[3][3] = {
        1,3,4,
    	5,3,2,
        3,4,5
    };
    printf("Minimum cost of path : %d" ,
         findMinimumCostPath(Cost, M-1, N-1));
    
}

Complexity of recursive method is exponential.

Notice that this problem satisfies two basic conditions to apply dynamic programming.

Optimal Sub-structure:- Optimal solution to bigger problem involves optimal solutions to subproblems.

Overlapping Subproblems:- Subproblems once computed can be stored in a table for further use. This saves the time needed to compute the same sub-problems again and again.

To find more about these properties, please refer : Dynamic programming basics.

Solution to this problem depends on solution to subproblems and we can see that there are many subproblems which are calculated again and again. So, this problem is ideal candidate for applying dynamic programming.

Let’s create a two dimensional matrix and fill it up bottom up. Let’s consider the top most row in matrix. Any cell in top most row can be reach only from left.

MinCost(0,j) = MinCost(0,j-1) + Cost[0][j]

Similarly, cell in leftmost column can only be reached from top.

MinCost(i,0) = MinCost(i-1,0) + Cost[i][0]

For all other cells,

MinCost(i,j) = Min( MinCost(i-1),j), MinCost(i, j-1)) + Cost[i][j]

Implementation of above equation is below:

#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
    if(a > b) return b;
    return a;
}

int findMinimumCostPath(int Cost[3][3], int M, int N){
	
    int MinCost[M][N]; //declare the minCost matrix

    MinCost[0][0] = Cost[0][0];

    // initialize first row of MinCost matrix
    for (int i=1; i<N; i++){
        MinCost[0][i] = MinCost[0][i-1] + Cost[0][i];
    }

    for (int i=1; i<M; i++){
        MinCost[i][0] = MinCost[i-1][0] + Cost[i][0];
    }
    
    for (int i=1;i<M; i++){
    	for (int j=1; j<N; j++){
           MinCost[i][j] = min(MinCost[i-1][j],
                           MinCost[i][j-1]) + Cost[i][j];
        }
    }

    return MinCost[M-1][N-1];
    
}
int main()
{
    int M,N; 
    M = N = 3; 
    int Cost[3][3] = {
    	1,3,4,
    	5,3,2,
    	3,4,5
    };
    printf("Minimum cost of path : %d" ,
            findMinimumCostPath(Cost, M, N));
    
}

Complexity of dynamic programming approach to find minimum cost path in grid is O(n2) with additional space complexity of O(n2).

There is extension to the problem which is to find actual path that lead to the destination. Solution is simple, start from the destination cell, as that will be part of path anyways, start moving either to cell to left or top whichever is minimum till you reach origin cell.

There is one more variant of this problem, when we can move from left to right, top to down and diagonally as well. Nothing changes in solution except that we need to take minimum of three cells (left, top and diagonal).

Please share if there is something wrong or missing. If you want to contribute, please write to us 

Boolean Parenthesization Problem

Counting Boolean Parenthesis

Let’s say there is a boolean expression, a string which contains True or False as operands and between each pair of operand we have boolean operator (and, or and xor).  An example of a boolean expression is T and F or T xor F. Ask of boolean parenthesization problem is to find number of ways in which this boolean expression can be parenthesized so that expression evaluates to True. For example :

Expression : T ^ F & T
Two ways : ((T ^ F) & T) and (T ^ (F & T))

T | T & F ^ T
Four ways : ((T|T)&(F^T)), (T|(T&(F^T))), (((T|T)&F)^T) 
and (T|((T&F)^T))

Let’s understand the concept first for solution and then we will see how can we implement it. If we start with an expression with only one boolean value T or F, how many way we can parenthesized it? Well, there are two answers to it. For T, there is one way, (T), whereas for F, there no way we can parenthesize to evaluates True. First take away for this insight is that we got the base case for recursion, if our solution is recursive. Second take away is that there can be two different outputs an expression or subexpression can evaluates and we have to store them both.

if T(i,j) is number of ways expression from i to j can be parenthesized so that it evaluates to True. Similarly, F(i,j) is number of ways expression evaluates to False. From above expression we know that :

T(i,i) = 1 if operand is T
         0 if operand is F

F(i,i) = 0 if operand is T
         1 if operand is F

How to calculate T(i,j)?  Let’s try to split this problem into smaller subproblems and reach at base case. If take an index k such that i<k<j, we have two smaller subproblems to solve now, T(i,k) and T(k+1,j). How can solution to these subproblems can be combined to get solution to bigger problem T(i,j). This is tricky part and depends on operator (and, or, xor) after k operand.

If there is “and” operand between two subexpressions (i,k) and (k+1,j) then how can expression (i,j) be True? Yes, only if expression (i,k) and expression (k+1,j) are True. For “and” operator, T(i,j) is:

T(i,j)  = Summation ( T(i,k) * T(k+1,j)) for all k such that i < k < j

How about expression (i,j) being evaluates to False? Simple enough, one of the two expression should evaluate to False.

F(i,j) = Sum ( Total (i,j) - T(i,k)* T(k+1)) for all k for i< k< j

What is Total(i,j) here?

Total(i,j) =   Total(i,k) *  Total( k+1, j) )
Total(i,k) =   T(i,k) + F(i,k)
Total(k+1,j) = T(k+1,j) + F(k+1,j)

In short, Total(i,j) = T(i,j) + F(i,j)

When operand between (i,k) and (k+1, j) is or:

T(i,j) = sum ( Total(i,j) - F(i,k)* F(k+1,j)) for k such i<k

In the same vein, T(i,j) and F(i,j) when operand is xor will be

T(i,j) = sum(T(i,k)*F(k+1,j) + F(i,k)* T(k+1,j)) for k such i<k

To find solution to problem all we need to find is T(1,n). Now, let’s see how to implement the approach.

Dynamic programming implementation for boolean parenthesization problem

#include<stdio.h>
#include<string.h>
 
int countNumberOfWaysToParenthesize(char operands[], char operators[], int n)
{
    int F[n][n], T[n][n];
 
    for (int i=0; i<n; i++){
        F[i][i] = (operands[i] == 'F')? 1: 0;
        T[i][i] = (operands[i] == 'T')? 1: 0;
    }
 
    for (int L=1; L<n; L++) {
    	for (int i=0, j=L; j<n; ++i, ++j){
            T[i][j] = F[i][j] = 0;
            for (int count=0; count<L; count++){
                int k = i + count;
                int totalIK = T[i][k] + F[i][k];
                int totalKJ = T[k+1][j] + F[k+1][j];
                if (operators[k] == '&') {
                    T[i][j] += T[i][k]*T[k+1][j];
                    F[i][j] += (totalIK *totalKJ - T[i][k]*T[k+1][j]);
                }
                if (operators[k] == '|'){
                    F[i][j] += F[i][k]*F[k+1][j];
                    T[i][j] += (totalIK*totalKJ - F[i][k]*F[k+1][j]);
                }
                if (operators[k] == '^'){
                    T[i][j] += F[i][k]*T[k+1][j] + T[i][k]*F[k+1][j];
                    F[i][j] += T[i][k]*T[k+1][j] + F[i][k]*F[k+1][j];
                }
            }
        }
    }
    return T[0][n-1];
}
 
// Driver program to test above function
int main()
{
    char operands[] = "TTFT";
    char operators[] = "|&^";
    int n = strlen(operands);
 
    printf ("\n Number of ways to parenthisize expression : %d", 
                 countNumberOfWaysToParenthesize(operands, operators, n));
    return 0;
}

We will have two strings, one string operands[] represents all operands (T or F) and other string operations[] representing all operator (‘&’, ‘|’ , ‘^’).  Operands[i] is inserted at operands[i+1] to generates the expression. Below is the code which uses this definition and provides implementation of method discussed earlier.

Complexity of this approach to find ways to parenthesize a boolean expression so that it evaluates to True is O(n3). There is also O(n2) space complexity.

Please share if there is something missing or wrong. If you want to contribute to algorithms and me, please contact us.

Dynamic Programming : Matrix chain multiplication

Matrix chain multiplication using dynamic programming

What is matrix multiplication? To read on that please refer to Wiki.
However, today’s problem is not about actually multiplying matrices, but to find out an order in which we multiply matrices so that number of multiplication done are minimum.

There are two basic properties of a matrix : number of rows and number of columns in matrix. Row and columns are called as dimensions of matrix.
Given N matrices of (MxN dimensions). Find an order in which these matrices are multiplied, so that total number of scalar multiplications are least. and dimensions of each matrix given in an array P where P[i-1] and P[i] denote rows and column respectively of ith matrix.

Matrix chain multiplication is a typical problem which is used to explain dynamic programming. Approach learnt here can be easily applied to many other problems.

Before going further, lets understand some points about matrix multiplication

1. Matrix multiplication is associative i.e.  A* (B*C) = (A*B) *C
2. It is not commutative i.e  A * (B*C) not equal to A * (C * B)
3. To multiply two matrices, they should be compatible i.e. no of columns in first matrix should be equal to number of rows of second matrix.
No of columns of first matrix = No. of rows of second matrix

Let’s get back to problem at hand.  Look at this problem as this : Figure out a way to put parenthesis around matrices so that total number of multiplication are least.

Brute force way is to find out every possible combination of parenthesis and check which one is minimum.
This is approach can be implemented in recursive way, once we put a parenthesis, problem is reduced putting parenthesis around N-1 matrices to be multiplied. However, code will be exponential in time and hence of no use for every large input.

Let’s see if dynamic programming fits in. We can see that cost of multiplying matrices Ai to Aj  is cost of

Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])

Idea is to find out K such that cost(Ai, Aj) becomes minimum. If M[i,j] represents the cost to multiply matrix i to matrix j, then,

M[i,j]  = M[i,k] + M[K+1,j] + ((P[i-1] * P[k] * P[j])

When calculating M[i,j]; M[i,k] and M[k+1,j] should be already available. Also, M[i,i] = 0 as cost of multiplying a single matrix will be 0.

Looking at the requirement to calculate M[i,j], it depends on length of chain. Length of chain from matrix i to matrix j would be i+j-1. We start with length L = 2 and go on to solve the table entry for length N. M[1, N] will give us the final cost.

We will start with L =2, chain we are considering is of length of 2 as j =  i + L -1  = 1 +2 -1 =2.  We will start with i =1 and loop runs to i = N-L +1 and considering two matrices at time.

Length being considered will vary for L =2 to L = N. Based on L, j will also change and we will be finding minimum cost(i,j) for each i and j. To find minimum cost(i,j), we need to find a K such that expression

Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])

becomes minimum. This is top down filling of table where M [1, 2] will be filled first and then going up to fill till M [1,N].

Implementation for multiplication of matrix problem

#include<stdlib.h>
#include<stdio.h>

#define MAX_INT 10000000

int matrixMultiplication(int p[], int N){

  int L,i, j, temp;

  int M[N][N];

  for(i=0; i<N; i++){
  	for(j=0; j<N; j++){
       M[i][j] = 0;
  	}
  }       

  for(L=2; L<N; L++){
    /* For every position i, we check every chain of len L */
    for(i=1; i<N-L+1; i++){
    	
        j = i+L-1;
        M[i][j] = MAX_INT;
        /* For matrix i to j, check every split K */
            for(int k=i; k<j; k++){
                temp = M[i][k] + M[k+1][j] + p[i-1] * p[k] * p[j];
               /* Check if the current count is less than minimum */
                if(temp < M[i][j]){
                    M[i][j] = temp;                 
                }
            }
        }
    }
    for(i=1; i<N; i++){
    	for(int k=1; k<N; k++){
    		printf("%d ", M[i][k]);
    	}
    	printf("\n");
    }
    
 return M[1][N-1];
}
/* Driver program to run above code */
int main(){

    int p [] ={10, 20, 30, 40, 30};
    int n = sizeof(p)/sizeof(p[0]);
    
    printf("%d\n", matrixMultiplication(p,n));
    
    return 0;
}

Let’s run through an example and understand how does this code work?

P = {10, 20,30,40,30}, 
dimensions of matrix [1] = 10X20, 
dimensions of matrix [2] = 20X30, 
dimensions of matrix [3] = 30X40,
dimensions of matrix [4] = 40X30

Function will be called with array P and size which is 5.
Table of N size is declared. M[6][6] because we will need to store M[1][4].

Diagonal of the table is set to zero, as cost of multiplication of single matrix is 0.

M[0][0] = M[1][1] =M[2][2]=M[3][3]=M[4][4]= M[5][5]=M[6][6] =0

Start with for loop with L =2. Inner for loop will run from i =1  to N-L+1 = 5-2+1 =4, j will be i +L -1. which is 2.
Inner most loop runs with K=1 (i) till k = 2 (j). M[1,2] is INT_MAX.

temp = M[1,1] + M[2,2] + P[0] *P[1]*P[2] = 6000.

Since it is less than INT_MAX M[1,2] = 6000.
Similarly, for i =2, j = 2 + 2 -1 =3. Above process is followed and M[2,3] = 24000 and so on.

M[3,4] = P[2] * P[3] * P[4] = 36000.

Coming to L =3. i =1, j = 4-1 =3. K =1 to K <3.
For K =1,

temp =  M[1,1] + M[2,3] + P[0] * [1] * P[3] = 24000 +8000 = 32000

For K=2,

temp = M[1,2] + M[3,3] + P[0]*[2]*P[3] = 6000 + 12000 = 18000

Hence M[1,3] = min(32000, and 18000) = 18000 with i = 1.

With i =2. j =2+3-1 =4. K =2 to K <4
For K =2,

temp = M[2,2] + M[3,4] + P[1] * P[2]*P[4] =  36000 + 18000 = 54000.

For K =3,

temp = M[2,3] + M[4,4] + P[1] * P[3]*P[4] =  32000 + 24000 =  66000.

Hence M[1,3] remains 18000.
Same process is followed and the entries are filled. Finally, matrix will look like this :

0 6000 18000 30000 
0 0 24000 48000 
0 0 0 36000 
0 0 0 0

M[1,N-1] will be solution to our problem.

Time complexity of matrix chain multiplication using dynamic programming is O(n2). Also space complexity is O(n2).
Reference
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Dynamic/chainMatrixMult.htm

If you want to contribute to website, please contact us. Please share if you there is something wrong or missing.

Longest common substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

Longest common substring using dynamic programming

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Let’s create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j].

As in case of longest common subsequence, we will start with smaller case first. What if one of the string is empty?  If one of the string is empty then, LCS = 0. Hence, LCS[i][0] = 0 and LCS[0][j] = 0.

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j]. 
2. Traverse the matrix and find the maximum element in it, that will be the length of Longest Common Substring.
(This step can be optimized by keeping track of max length while calculating LCS[i][j]).

Implementation

#include <stdio.h>
#include <string.h>
 
int max(int a, int b){
	return a>b ? a:b;
}
 int longestCommonSubstring(char * A, char * B){
 	int lenA = strlen(A);
 	int lenB = strlen(B);
 
	int LCS[lenA+1][lenB+1];
 
	for (int i=0; i <= lenA; i++){
		LCS[i][0] = 0;
	}
 
	for (int j=0; j <= lenB; j++){
		LCS[0][j] = 0;
	}
 
	int maxLength = 0;
 
	for (int i=1; i<= lenA; i++){
		for (int j=1; j <= lenB; j++){
			if (A[i] == B[j]){
				LCS[i][j] = 1 + LCS[i-1][j-1];		
				maxLength = max( maxLength, LCS[i][j] );
			} 
			else {
				LCS[i][j] = 0;
			}
	    }
	}
	return maxLength;
}
 
int main(void) {
	char *a = "ABCDEFGSE";
	char *b = "EBCDEFGV";
 
	printf("\n Longest common substring : %d",
			longestCommonSubstring(a,b));
 
	return 0;
}
 

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.

Apache Spark : Introduction

Apache Spark

Before understanding what is Apache Spark and how it helps in data pipelines, let’s understand history behind big data and map reduce. Map reduce introduced ability to process vast amount of data on commodity hardware. Map reduce implementations inherently had fault tolerance, locality aware scheduling and load balancing. Programming models supported acyclic flow of data through data models. These are good for most kind of applications, however there are two big class of applications which were not completely and satisfactorily solved by existing programming models and implementations of Map Reduce. Those application classes are :

  • Iterative applications

Application where output of a stage is fed as input to one of previous stage to refine things. Machine learning application are usually of this nature where same datasets are processed again and again to refine a feature. Available map reduce implementation Hadoop was inefficient in handling this use case because in Hadoop communication between two stages of pipeline happens through HDFS file system, that is disk. A stage has to process data, and then put data on to disk where next stage can pick up again. This leads to delays in processing and there is no optimization to support iterative applications.

Apache Spark introduction
Typical data processing in Hadoop
  • Interactive applications

This are applications which are ETL in nature with requirement of a console where data is to be presented. User can change input and then ask for different set of data based on input. Again, in Hadoop, each such request will be treated independently and every request will fetch data from disk. This leads to delays and application does not remain interactive anymore.

Comparison b/w Hadoop and Spark
Interactive system using Map Reduce

To solve challenges of these applications, different tools were introduced, like Apache Storm for stream processing or Hive and Pig for interactive usage of Hadoop, but none of these tools were solving both the problem at same time. Also, they lacked the abstraction so that they can be used for any general purpose.

Apache spark introduction
Specialized map reduce systems for solving specific problems

How does Apache Spark solves challenges of these types of applications?

Apache Spark : In memory data storage and processing

From above analysis, we understood that the basic problem with Hadoop implementation of Map reduce is disk access. If data which is read from disk can be read from memory, system becomes efficient. Apache Spark does exactly that. All of the data is stored in memory as abstract entities called Resilient Distributed Datasets (RDDs). More on that later.

Two questions come to mind when we say data is in memory: First, how fault tolerance works as data may be lost if a node goes down? Second, what if data is more than memory available at node? Fault tolerance is solved by maintaining meta data ( Step that led to creation of this RDD) about how RDDs on that nodes are created and if node goes down, it can re-run the steps which led to RDDs at this node and get whole data. More how RDDs can be created and used in Spark later.

Second, if data is more than memory available, it is written on disk, this affects the performance, but it is edge case and not a normal working condition.

This was the brief introduction to Apache Spark. We will continue on this thread in coming post. Stay tuned.

If you see something is missing or wrong, please share and we will fix it.