Count ways to reach nth stair

Count ways to reach nth stair

Problem is to count number of ways a person can reach to top of n stairs, given that he can move 1 or 2 stairs at a time. For example,

count number of ways to reach nth stair

n = 1, number of ways 1
n = 2, number of ways 2 (1,1), (2)
n = 4, number of ways 5 (1,1,1,1), (1,1,2), (1,2,1),(2,1,1), (2,2)

This problem is usually asked to check if you have basic understanding about recursion and can you reduce problems to subproblems and solve them recursively. Counting number of ways to reach nth stair is nothing but modified version of fibonacci series.

Count ways to reach nth stair : reduction

Original problem to climb nth stair is reduced to (n-1)th stair once person moves 1 stair or to (n-2)th stair, if person moves 2 stairs. Now, all you have to solve is to count number of ways person can climb (n-1) and (n-2) stairs.

Ways(n) = Ways(n-1) + Ways(n-2)

Now that you have come up with recurrence relation, when do you stop? Termination condition is must for any recursion. Think what happens when person has only 1 stair to climb? In how many ways he can do it? There is only one possible way, to climb that stair. What if there are last two stairs? Then there are two ways to solve, climb one step at a time or climb  two stairs. This gives us base cases

Ways(1) =  1; Ways(2) = 2

Recursive implementation to count ways to reach nth stair

#include <stdio.h>

int findWays(int n)
{
   if(n==1 || n==2) return n; 
   return findWays(n-1) + findWays(n-2);
}

int main ()
{
  int n = 4;
  printf("Number of ways = %d", findWays(n));
  return 0;
}

Complexity of recursive implementation to count ways to reach nth stair is exponential. Look at the recursive tree of implementation and it becomes evident that there are many subproblems which are calculated multiple times.

count number of ways to nth stair

What is the best way to avoid recalculation? Memoization. Replace recursive function with stored value. Going further, we can use dynamic programming. Create a 1 dimensional array and initialize A[1] =1 and A[2] = 2. Now, to calculate A[3], we can use information already stored in A[1] and A[2]. A[4] can be calculated using A[3] and A[2].

Dynamic programming implementation to find number of ways t nth stair

#include<stdio.h>

int countWays(int n ) {
    int table[n+1];
    table[1] = 1; table[2] = 2;
    for (int i=3; i<=n; i++){
       table[i] = table[i-1] + table[i-2];
    }
    return table[n];
}

int main ()
{
    int n = 4;
    printf("Number of ways = %d", countWays(n));
    return 0;
}

Complexity of dynamic programming approach is O(n) and with additional space complexity O(n).

Please share if there is something is missing or wrong. If you want to contribute to algorithms and me, please contact us.

Box stacking problem – Dynamic programming

Box stacking problem

Consider, we have been given 3-Dimensional boxes. Each box has width, depth and height (wi, di, hi). Box stacking problem is to stack these boxes in such a way that we achieve maximum height. There is one condition which is attached to it : A box can be placed on top of another only if both it’s base dimensions width and depth are less than box on which it stacked on. There is no restriction on height, a tall box can be placed on a short box.

Box stacking problem

With conditions in place, with given n boxes, we are actually, we are building a pyramid of boxes with maximum height.

This problem is closely related to longest increasing subsequence.

Recurrence relation for box stacking problem

Solution involves understanding of rotation aspects of the boxes. To avoid this aspect affect our solution, we list down all rotations of boxes as individual boxes. Therefore, for each box we have three representations. For example, for a box with dimensions a,b,c  such that a>b>c

representation 1 : <h=a, w=b, d=c> 
representation 2 : <h=b, w=a, d=c> 
representation 3 : <h=c, w=a, d=b>

Without losing generalization, we can avoid representation where wi<di. Now that we have three representations for each box, our input space increases to 3XN and solution will be using these 3N boxes. There is another catch here. This solution works only when there are multiple instances of each box and we can use two different orientations of same box while fetching maximum height.

Finding the sort order

Another problem is these boxes which are given to us are not ordered in any form. However, to stack boxes, we need to consider them in some order. As height does not affect stacking order, we can ignore it. Now, we have to consider only two dimensions.

Let’s say, we order boxes on base area in decreasing order. How does it work? Consider two boxes with different base areas. It is impossible for a box with larger base area to be stacked on top of a box with smaller base area. There are only two dimensions, so at least one must be larger than corresponding dimension smaller base area box. Therefore, if a box within our sequence can’t be placed on top, no box with a greater area can be placed on top either.

Let H(i) be height of stack of boxes 1,2,3,4…i. Modeling recurrent relation for H(i), we want to put box i on a box j such that wi<wj and di <dj and H(j) is maximum for all j less than i.

H(i) = max(H(i), H(j) for all j < i such that wi<wj & di<dj ) + hi

Finally,output will be max of all H[i].

Box stacking problem using dynamic programming implementation

Implementation is quite simple, we need one dimension array H[]. Each element H[i] represents H[i] using boxes from i to i. These boxes are already sorted by area in decreasing order.

#include <iostream>
#include <algorithm>
using namespace std;

typedef struct box {
    int width;
    int depth;
    int height;
} Box;

bool boxComparator(Box b1, Box b2) {
    return ( b1.depth * b1.width > b2.depth * b2.width );
}
 
int findMaxHeightBoxStack(Box boxes[], int n)
{
	int H[n];
	for(int i=0; i<n; i++){
		H[i] = boxes[i].height;
	}
    for(int i=1; i<n; i++){
    	for( int j=i-1; j>=0; j--){
    		if(boxes[j].width > boxes[i].width 
    		   && boxes[j].depth > boxes[i].depth 
    		   && H[j] + boxes[i].height){
    		   	H[i] = H[j] + boxes[i].height;
    		}
    	}
    }
	
	int maxHeight = 0 ;
	for(int i=0; i<n; i++){
		if(maxHeight < H[i]){
			maxHeight = H[i];
		}
	}
	return maxHeight;
}

int boxStacking(Box boxes[], int n)
{
    Box orientations[3*n]; //for rotations
    int index = 0;
    for(int i=0; i<n; i++){
        orientations[index] = boxes[i]; // first one as such
        index++;
		
        orientations[index].height = boxes[i].width;
        orientations[index].width = max( boxes[i].height, boxes[i].depth) ;
        orientations[index].depth = min( boxes[i].height, boxes[i].depth);
	index++;

        orientations[index].height = boxes[i].depth;
        orientations[index].width = max( boxes[i].height, boxes[i].width) ;
        orientations[index].depth = min( boxes[i].height, boxes[i].width) ;
        index++; 
     }
     n = 3*n;

     sort(orientations, orientations+n, boxComparator);
     return findMaxHeightBoxStack( orientations, n);
}
 
// Driver program
int main()
{
    //Job jobs[] = {{3, 10, 20}, {1, 2, 50}, {6, 19, 100}, {2, 100, 200}};
    Box boxes[] = { {4, 6, 7}, {1, 2, 3}, {4, 5, 6}, {10, 12, 32} };
    int n = sizeof(boxes)/sizeof(boxes[0]);
    cout << "Maximum height is " << boxStacking(boxes, n);
    return 0;
}

Complexity of algorithm to find maximum height in box stacking problem is O(n2) and space complexity is O(n).

This problem can be extended by putting boxes with K dimensions instead of 3 dimensions. Then also, approach would be the same only number of orientations will change.

Please share if there is something is wrong or missing. If you want to contribute to algorithms and me, please contact us.

Reference : https://people.cs.clemson.edu/~bcdean/dp_practice/dp_5.swf

Boolean Parenthesization Problem

Counting Boolean Parenthesis

Let’s say there is a boolean expression, a string which contains True or False as operands and between each pair of operand we have boolean operator (and, or and xor).  An example of a boolean expression is T and F or T xor F. Ask of boolean parenthesization problem is to find number of ways in which this boolean expression can be parenthesized so that expression evaluates to True. For example :

Expression : T ^ F & T
Two ways : ((T ^ F) & T) and (T ^ (F & T))

T | T & F ^ T
Four ways : ((T|T)&(F^T)), (T|(T&(F^T))), (((T|T)&F)^T) 
and (T|((T&F)^T))

Let’s understand the concept first for solution and then we will see how can we implement it. If we start with an expression with only one boolean value T or F, how many way we can parenthesized it? Well, there are two answers to it. For T, there is one way, (T), whereas for F, there no way we can parenthesize to evaluates True. First take away for this insight is that we got the base case for recursion, if our solution is recursive. Second take away is that there can be two different outputs an expression or subexpression can evaluates and we have to store them both.

if T(i,j) is number of ways expression from i to j can be parenthesized so that it evaluates to True. Similarly, F(i,j) is number of ways expression evaluates to False. From above expression we know that :

T(i,i) = 1 if operand is T
         0 if operand is F

F(i,i) = 0 if operand is T
         1 if operand is F

How to calculate T(i,j)?  Let’s try to split this problem into smaller subproblems and reach at base case. If take an index k such that i<k<j, we have two smaller subproblems to solve now, T(i,k) and T(k+1,j). How can solution to these subproblems can be combined to get solution to bigger problem T(i,j). This is tricky part and depends on operator (and, or, xor) after k operand.

If there is “and” operand between two subexpressions (i,k) and (k+1,j) then how can expression (i,j) be True? Yes, only if expression (i,k) and expression (k+1,j) are True. For “and” operator, T(i,j) is:

T(i,j)  = Summation ( T(i,k) * T(k+1,j)) for all k such that i < k < j

How about expression (i,j) being evaluates to False? Simple enough, one of the two expression should evaluate to False.

F(i,j) = Sum ( Total (i,j) - T(i,k)* T(k+1)) for all k for i< k< j

What is Total(i,j) here?

Total(i,j) =   Total(i,k) *  Total( k+1, j) )
Total(i,k) =   T(i,k) + F(i,k)
Total(k+1,j) = T(k+1,j) + F(k+1,j)

In short, Total(i,j) = T(i,j) + F(i,j)

When operand between (i,k) and (k+1, j) is or:

T(i,j) = sum ( Total(i,j) - F(i,k)* F(k+1,j)) for k such i<k<j
F(i,j) = sum ( F(i,k) * F(k+1, j)) for k such that i<k<j

In the same vein, T(i,j) and F(i,j) when operand is xor will be

T(i,j) = sum(T(i,k)*F(k+1,j) + F(i,k)* T(k+1,j)) for k such i<k<j
F(i,j) = sum(T(i,k) * T(k+1,j) + F(i,k)*F(k+1,j)) for k such that i<k<j

To find solution to problem all we need to find is T(1,n). Now, let’s see how to implement the approach.

Dynamic programming implementation for boolean parenthesization problem

We will have two strings, one string operands[] represents all operands (T or F) and other string operations[] representing all operator (‘&’, ‘|’ , ‘^’).  Operands[i] is inserted at operands[i+1] to generates the expression. Below is the code which uses this definition and provides implementation of method discussed earlier.

#include<stdio.h>
#include<string.h>
 
int countNumberOfWaysToParenthesize(char operands[], 
                                    char operators[], 
                                    int n){
    int F[n][n], T[n][n];
 
    for (int i=0; i<n; i++){
        F[i][i] = (operands[i] == 'F')? 1: 0;
        T[i][i] = (operands[i] == 'T')? 1: 0;
    }
 
    for (int L=1; L<n; L++) {
    	for (int i=0, j=L; j<n; ++i, ++j){
            T[i][j] = F[i][j] = 0;
            for (int count=0; count<L; count++){
                int k = i + count;
                int totalIK = T[i][k] + F[i][k];
                int totalKJ = T[k+1][j] + F[k+1][j];
                if (operators[k] == '&') {
                    T[i][j] += T[i][k]*T[k+1][j];
                    F[i][j] += (totalIK *totalKJ - T[i][k]*T[k+1][j]);
                }
                if (operators[k] == '|'){
                    F[i][j] += F[i][k]*F[k+1][j];
                    T[i][j] += (totalIK*totalKJ - F[i][k]*F[k+1][j]);
                }
                if (operators[k] == '^'){
                    T[i][j] += F[i][k]*T[k+1][j] + T[i][k]*F[k+1][j];
                    F[i][j] += T[i][k]*T[k+1][j] + F[i][k]*F[k+1][j];
                }
            }
        }
    }
    return T[0][n-1];
}

int main(){
    char operands[] = "TTFT";
    char operators[] = "|&^";
    int n = strlen(operands);
 
    printf ("\n Number of ways to parenthesize expression : %d", 
                 countNumberOfWaysToParenthesize(operands, operators, n));
    return 0;
}

Complexity of this approach to find ways to parenthesize a boolean expression so that it evaluates to True is O(n3). There is also O(n2) space complexity.

Please share if there is something missing or wrong. If you want to contribute to algorithms and me, please contact us.

Longest ZigZag Subsequence

Longest zigzag subsequence

A sequence of numbers is called a zig-zag sequence if the differences between successive numbers strictly alternate between positive and negative.  This post is about finding longest zigzag subsequence. Another definition is , zigzag subsequence is where elements of subsequence are alternating order, means, they satisfy below conditions:

x1 < x2 > x3 < x4 > x5 < …. xn 
or 
x1 > x2 < x3 > x4 < x5 > …. xn

A sequence with fewer than two elements is trivially a zigzag sequence.

For example, 1,9,3,9,1,6 is a zig-zag sequence because the differences (8,-6,6,-8,5) are alternately positive and negative. In contrast, 1,6,7,4,5 and 1,9,4,4,5 are not zig-zag sequences, the first because its first two differences are positive and the second because its last difference is zero.

Given a array of integers, find length of the longest zigzag subsequence.

Intention of this post to utilize what we learned when we solved longest increasing subsequence in array using dynamic programming. This problem was asked in top coder. If you want to see detailed problem and other examples, please refer : Zigzag problem at top coder.

Longest zigzag subsequence using dynamic programming

Let’s assume we have an array A with given integers. Table(i,0) represents the length of longest zigzag subsequence ending at i with last difference as positive, mean difference between last and second last elements of subsequence is positive.

Similarly, Table(i,1) represents length of longest zigzag subsequence ending at i with last difference being negative.

With these definitions, below relations can be worked out.

Table(i,0) = max (Table(i,0), Table(j,1) + 1); 
             for all j < i and A[j] < A[i] 
Table(i,1) = max (Table(i,1), Table(j,0) + 1); 
             for all j < i and A[j] > A[i]

How do we get it? When we are at index i of an array, this element at index i can be included in longest zigzag subsequence, then there two possibilities. First, that element at index i is greater than the last element in previous longest zigzag subsequence. In this case, we look for all j < i where A[j] < A[i] and update Table[i,0] for maximum value of Table(j,1) +1 till i for j < i.

Second, element at index i is less than last element in zigzag sequence. In this case, we look for all j < i where A[j] > A[i] and update Table(i,0) for maximum value Table(j,1) +1 till i for j< i.

What will be length of longest zigzag subsequence for index i?

Result =  max (Table(i,0), Table(i,1))

Now that we know the recursion relation, implementation is really simple.

Longest zigzag subsequence problem implementation

Complexity of dynamic programming approach to find longest zigzag subsequence is O(N2) using O(N) extra space.

Please share if there is something wrong or missing. If you want to contribute to website, please contact us.

Weighted job scheduling – dynamic programming

Weighted job scheduling

Suppose we have been give n jobs j1, j2,j3…jn with their start time s1,s2,… sn and finish time f1,f2, f3…fn. There is a value vi associated with each job. Ask is to find a schedule of these jobs such all jobs are compatible and we get maximum value.Two jobs are said to be compatible, if there execution time do not overlap. This problem is know as weighted job scheduling problem.

For example, we have four jobs as shown below:

Weighted job scheduling example

In above figure maximum value can be achieved by scheduling job 1 and job 4 which is value of 150. Notice that there one more schedule with compatible jobs (Job1, Job2 and Job 3), however, value we get by that schedule is only 100 which is less than what we got in earlier schedule.

There is strong urge to use greedy algorithm here. However, greedy algorithm works for this problem when value of all jobs is equal. Since value of jobs is different here, greedy algorithm fails.

Let’s consider brute force solution. First of all, sort all jobs based on finish time in increasing order. Now, for each job, decide if including it in schedule gives us maximum value or excluding it will give us maximum value. When we are including a job we have to check if it is compatible with other jobs which are included in schedule.

Determine compatibility of jobs

To determine compatibility quickly, we pre-calculate an array, called P such that

p(j) = largest index i < j such that job i is compatible with j.

For example:

weighted job scheduling

Now, let’s say OPT(j) represents the maximum value which we gain by adding jobs from 1 to j. As mentioned above, there are two cases:

Case 1: OPT selects job j. In this case we can not use incompatible jobs { p(j) + 1, p(j) + 2, …, j – 1 } – must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, p(j).

Case 2: OPT does not select job j. – must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, j-1

For case 1, we already have P[j] calculated. With P[j] already prepared, we know that we don’t have to check any job later than P[j] as all of them will be conflicting with current job. Recursive formula for calculating maximum value for n jobs will be:

OPT( j) = 0 if j = 0 
          max { vj + OPT( p(j) ), OPT(j-1)} otherwise

Weighted jobs scheduling- Recursive way

This recursive algorithm has exponential complexity as there are lot of subproblems which are calculated repeatedly. For example,

wieghted job scheduling using dynamic programming

Recursive execution tree for above problem would like

recursive execution of weighted jobs scheduling solution

From figure, we can see that there are subproblems which are being re-calculated. Typical way to avoid this repetition is to store solutions to subproblem, this method is called memoization. This is kind of a cache where results of subproblems are stored and looked into whenever required.

Dynamic programming implementation of weighted job scheduling

Run time complexity of dynamic programming approach for weighted job scheduling problem is O(nlogn) which is dominated by sort. Sorting takes O(nlogn) and calculation of maximum value takes O(n).
If we have pre-sorted input based on finish time, then this approach takes only O(n). Additional O(n) space is used for storing results of subproblems.

How about finding the solution itself, means to find actual jobs in schedule that gives us optimal value? This requires some post processing. Algorithm is as follows

Find-solution(j) : 
 if (j = 0) output nothing 
 else if (vj + Table[P(j)] > Table[j-1]) print j 
     Find-Solution(p(j)) 
 else Find-Solution(j-1)

Please share if there is something wrong or missing. If you are interested in contributing to algorithms and me, please drop a mail and refer publishing at algorithms and me.

Minimum Cost Path in a 2D Matrix

Minimum cost path in grid

Problem statement is : Given a 2D matrix, Cost[][], where Cost[i][j] represent the cost of visit cell Cost[i][j], find minimum cost path to reach a given cell Cost[n][m], where a cell can be reach from it’s left (by moving one step right) or from top (by moving one step down).

minimum cost path in matrix

Finding Minimum Cost Path in a 2D Matrix

This problem is very similar to what we saw in Finding possible paths in grid. As mentioned there, grid problem reduces to smaller subproblems one we make a choice at a cell. In this problem, to reach a cell (i,j), we can either come from cell ( i-1, j ) from the top, or from cell (i, j-1) from left, whichever is minimum.

CostToMove(i,j) = Min(CostToMove(i-1,j), CostToMove (i, j-1)) 
                 + Cost(i,j)

This equation gives hint at recursion. What should be terminating condition for recursion. It’s obvious, when we reach desired cell.

findCost(i,j, cost) = cost(i,j) + Min( findCost(i-1,j, cost),
                                       findCost(i,j-1, cost))

Recursive implementation of recursion equation to find minimum cost to reach a given cell in matrix.

Complexity of recursive method is exponential.

Notice that this problem satisfies two basic conditions to apply dynamic programming.

Optimal Sub-structure:- Optimal solution to bigger problem involves optimal solutions to subproblems.

Overlapping Subproblems:- Subproblems once computed can be stored in a table for further use. This saves the time needed to compute the same sub-problems again and again.

To find more about these properties, please refer : Dynamic programming basics.

Solution to this problem depends on solution to subproblems and we can see that there are many subproblems which are calculated again and again. So, this problem is ideal candidate for applying dynamic programming.

Let’s create a two dimensional matrix and fill it up bottom up. Let’s consider the top most row in matrix. Any cell in top most row can be reach only from left.

MinCost(0,j) = MinCost(0,j-1) + Cost[0][j]

Similarly, cell in leftmost column can only be reached from top.

MinCost(i,0) = MinCost(i-1,0) + Cost[i][0]

For all other cells,

MinCost(i,j) = Min( MinCost(i-1),j), MinCost(i, j-1)) + Cost[i][j]

Implementation of minimum cost path using dynamic programming

Complexity of dynamic programming approach to find minimum cost path in grid is O(n2) with additional space complexity of O(n2).

There is extension to the problem which is to find actual path that lead to the destination. Solution is simple, start from the destination cell, as that will be part of path anyways, start moving either to cell to left or top whichever is minimum till you reach origin cell.

There is one more variant of this problem, when we can move from left to right, top to down and diagonally as well. Nothing changes in solution except that we need to take minimum of three cells (left, top and diagonal).

Please share if there is something wrong or missing. If you want to contribute, please write to us and refer publishing at algorithms and me.

Longest common substring -Dynamic programming

Longest common substring

Given two string A and B, find longest common substring in them.  For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2 substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m). Can we do better than that?

Longest common substring using dynamic programming

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Let’s create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j].

As in case of longest common subsequence, we will start with smaller case first. What if one of the string is empty?  If one of the string is empty than, LCS = 0. Hence, LCS[i][0] =0 and LCS[0][j] = 0.

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
       ( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j]. 
2. Traverse the matrix and find the maximum element in it, that will be the length of Longest Common Substring.
(This step can be optimized by keeping track of max length while calculating LCS[i][j]).

Implementation

Table filled by above code is as

Longest common substring using dynamic programming

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

In next post we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

Please share if you find something wrong or missing. If you want to contribute to site, please refer : Publish on Algorithms and Me and contact us. We would be happy to publish your work and in turn will pay you too.

Longest Substring Without Repeating Characters

Longest Substring Without Repeating Characters

Given a string, find longest substring without repeating characters in it. For example, S = “abcaabaca”, longest substring without repeating characters will be “abc”

Brute force solution will be to scan all substrings of given string and check which one has longest length and no repeating characters. For a string with size n, there will be n * (n-1) substrings, and to check it each for unique characters, it will take n comparison in worst case. So, worst case complexity of this algorithm is O(n3) with additional space of O(n). Code is simple enough.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
unsigned int checkIfAllUniqueCharacters(char *s, int start, int end) {
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
	for (int i = start; i < end; i++) {
	    char ch = s[i];
	    if (table[ch-'a']) return false;
 
            table[ch-'a'] = true;
    }
    return true;
}
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	int maxLength = 0;
 
	for (int i = 0; i < n; i++){
	    for (int j = i + 1; j <= n; j++){
		int length = j-i;
		if (checkIfAllUniqueCharacters(s, i, j)){
			maxLength = MAX(maxLength, length);
		}
	    }
	}
	return maxLength;
}
 
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
		longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Longest Substring Without Repeating Characters: Sliding window approach

A sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in the array/string which usually defined by the start and end indices. A sliding window is a window “slides” its two boundaries to the certain direction.

In brute force approach, we repeatedly checked each substring for unique characters. Do we need to check each substring? If a substring s[i,j-1] contains non repeating characters, while adding jth character, check if that character is already present in substring s[i,j]. Since we are scanning substring to ascertain uniqueness of new character being added, complexity of this algorithm is O(n2).

How about optimizing the scanning part? What if we use a hash to store characters which are already seen in substring s[i,j], then checking uniqueness is done in O(1) and hence overall algorithm complexity becomes linear. Here is how to use hash to find solution.

Store characters of substring[i,j] (sliding window) in hash. When new character is processed, check if that character is present in hash,If character is not present in hash, add it to hash and increment j. If character already present in substring s[i,j], that means, it cannot be added to longest substring. Find length of substring (j-i) and compare it with current max, if it is greater, max length of longest substring without repeating characters is (j-i). Also, window also moves from i to i+1 and check for longest substring from i+1 index.

Longest substring without repeating characters implementation

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
char * substr(char *s, int start, int end){
	char *dest =  (char *)malloc(end-start+1);
	int k = 0;
	int i = start;
	while(i < end){
	    dest[k++] = s[i++];
	}
	return dest;
}
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	while (i < n && j < n) {
		if (!table[s[j] -'a']){
			table[s[j++] -'a'] = true;
			int length = j-i;
			maxLength = MAX(maxLength, length);
		}
		else {
			table[s[i++] -'a'] = false;
		}
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

As mentioned above, complexity of this method is O(n), with additional space complexity of O(n).

Below is example execution of above code.

Current Character : a
Substring (  ) does not contain a
New length of substring without repeating character : 1
Current Character : b
Substring ( a ) does not contain b
New length of substring without repeating character : 2

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : a
Substring (abc) contains a
Advancing i to 1

Current Character : a
Substring ( bc ) does not contain a
New length of substring without repeating character : 3

Current Character : b
Substring (bca) contains b
Advancing i to 2

Current Character : b
Substring ( ca ) does not contain b
New length of substring without repeating character : 3

Current Character : c
Substring (cab) contains c
Advancing i to 3

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : b
Substring (abc) contains b
Advancing i to 4

Current Character : b
Substring (bc) contains b
Advancing i to 5

Current Character : b
Substring ( c ) does not contain b
New length of substring without repeating character : 3

Current Character : b
Substring (cb) contains b
Advancing i to 6

Current Character : b
Substring (b) contains b
Advancing i to 7

Current Character : b
Substring (  ) does not contain b
New length of substring without repeating character : 3

Longest substring without repeating characters : 3

There is a small optimization which helps us to skip more characters when repeating character is found instead skipping one at a time. Store the index of each character seen in substring [i,j-1].  While processing jth character, if it is already in hash, we know the index j’ where that character is in string. There is no way that any substring can contain unique characters till j’ and j are in it. So, we skip all indices from i to j’ and start from j’+1 instead of i+1 as in above method.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	for (int j = 0, i = 0; j < n; j++) {
		if (table[s[j]-'a']) {
			int currentIndex = table[s[j]-'a'];
			i = MAX(currentIndex, i);
		}
		int currentLength = j - i + 1;
		maxLength = MAX(maxLength, currentLength);
            table[s[j]-'a'] = j + 1;
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Complexity of find longest substring without repeating character is hence O(n) with additional space complexity of O(n).

Please share if something is wrong or missing. We would love to hear from you.

Number of ways to fill or construct wall of N X 4 with 4 X 1 bricks

Let me clarify the question, as shown in figure 1 X 4 brick can be used as 4 X 1 brick also, and we have a wall of N X 4 or 4 X N unit.

brick_wallTo fill this wall with given brick, we can either put brick standing, or horizontally. Going further, we will take above wall as input and not as N X 4, although solution remains the same
Now, if I have a wall with N =1, i.e. 4 X 1 wall, there is only one possible way I can fill that way, and that is to lay brick horizontally on the wall. Solutions = 1
What if N = 2, i.e. wall as 4 X 2, then I can put two bricks horizontally, that is the only way, I can put brick in vertical as it will go our of the wall. Solutions = 1
Take N  =3, i.e. wall with 4 X 3, only way we can fill the wall is to put three bricks horizontally, can’t use vertical brick. So, solutions = 1
What if N =4, wall with 4 X 4 dimensions, in this scenarios, there are two cases, we put four bricks horizontally or we put four bricks vertically, so the number of solution for 4 X 4 is 2.

Figure explains what I wrote above

brick_wall1If we take number of solutions as f(n) then f(n) for values 1,2 and 3 is as follows.

f(1) =1, f(2)=1, f(3)=1

We have two choices for each brick for wall of size greater than 4 X 3.  Either we keep brick vertically or we keep brick horizontally. If we keep brick vertically, we cover four units out of N units height of wall with each brick, require four vertical bricks to cover horizontally, so problem reduces to N-4 units. If we keep brick horizontally, then it covers only 1 unit height of wall, hence we need to cover N-1 units of height further.
So, for n we have relationship as

f(n) = f(n-1)  + f(n-4)

Now we have recurrence relation and based condition, it should be easy to implement it with recursion.

However, if we look at the solution closely, we can find that many of the sub problems are being solved again and again, that means sub problems are overlapping. Remember Fibonacci series? So typical case of dynamic programming. Below code provides solution with Dynamic programming.

This is the simple implementation in dynamic programming, to solve the problem of finding number of ways in which N X 4 wall can be constructed or mapped with 4 X 1 or 1 X 4 bricks.
If you find some issue in understanding it, please leave a comment, I will elaborate it more. Otherwise you can share and spread the knowledge.

Check if string is interleaved of two strings

Given three strings A,B and C, find if C string is interleaved of two strings. A and B. C is said to be interleaved if it contains all the characters of A and B and order of characters in respective strings is maintained. For example string C in figure is interleaved string of A and B.

check string is interleaved of two strings

Find if string is interleaved of two strings

Simple algorithm to find if a string is interleaved of two other string is to start checking length of C as length of A + length of B. If it is not true return false (why?).

If length criteria passes, check first character of C and A. If they match, move to second character of C and A. Keep B at first character. If character in A does not match with character in C, check if first character of C and B match, if they do, then move to second character of C and B. A remains at first character.

Once done, repeat above steps with next characters of strings, while character in C matches character in A or B. If both current character in C does not match with any of the current characters in A and B, return false.

With above algorithm, with each comparison, problem reduces to smaller size, with C with length N-1, one of A or B with M-1. From the description, it is clear that recursion can be used to solve interleaving problem.

isInterleaved(A,B,C)  = isInterleaved(A+1, B, C+1) 
                        || isInterleaved(A, B+1,C+1)

What shall be base case for above recursion?
If end of C is reached, all characters in C are checked, return true if all characters in other two strings are also considered. If not returned false.
Other way, if both string are A and B are fully considered and there are still characters left in C to be considered, then return false. (This case is anyways caught at length check).

Recursive Implementation of interleaved strings algorithm


#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 
 
int isInterleaved(char *c, char *a, char *b){
 
    if(!(*c) && !(*a) && !(*b))
        return true;
 
    if(*c == '\0'){
        return false;
    }
	 // if character of a and c match
    return ((*c == *a) && isInterleaved(c+1,a+1,b)) || 
    		// if character of b and c match
            ((*c == *b) && isInterleaved(c+1,a,b+1)); 
}
 
int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
	    printf("\n String is interleaved");
	}
	else{
	    printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation is as follows

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 
 
int isInterleaved(char *c, char *a, char *b){
 
    while(*c != '\0'){
    	if(*c == *a){
            a++;
        }
        else if(*c == *b){
            b++;
        }
        else{
            return false;
        }
        c++;
    }
    if(*a != '\0' || b != '\0')
        return false;
 
    return true;
}
 
int main(void) {
	// your code goes here
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation will not work with input with common characters in string A and B, for example A = XXY and B = XXZ and if C = XXZXXY will return false.

Complexity of iterative implementation will be linear O(N), N being length of string C, where as complexity of recursive solution will b O(2N) but it does not fail in when two strings have common character.

Dynamic programming approach to interleaved strings problem

Look closely and notice that many subproblems are being calculated again and again. Recursion tree for input A = XXY and B = XXZ and C = XXZXXY

interleaved strings with dynamic programming

Looking at the recursion tree, it is clear that if results of subproblems are stored, they need not be calculated again and again. Create a two dimensional table. Table(i,j) = true only if C[i+j-1] is interleaved string if A[i] and B[j]. Empty string is interleaved of two other strings so, Table[0][0] = true
If one of the strings was empty: Table(i,0) = A[i] == C[i] && Table(i-1, 0) that is to say if till i-1 characters C was interleaved of A, then for ith character it will be true if ith character matches ith character of A. Note that B is empty. Again if string A is empty, then Table(0,j) = Table(0, j-1). With these base cases, we can fill table bottom up as follows

Table(i,j) = Table(i-1,j)  if (A[i] == C[i+j]) && (B[j] != C[i+j])
Table(i,j) = Table(i,j-1) (B[i] == C[i+j])&& (A[i] != C[i+j])
Table(i,j) = Table(i-1,j) || Table(i, j-1) if (A[i] == C[i+j]) && (B[j] == C[i+j])

Dynamic programming solution to find if string is interleaved of two strings

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 
int isInterleaved(char *c, char *a, char *b){
 
    int lenA = strlen(a);
    int lenB = strlen(b);
    int i,j;
 
    int Table[lenA+1][lenB+1];
    // Initialization
	for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	Table[i][j] = false;
        }
	}
    for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	// Both strings are empty
            if(i==0 && j==0)
                Table[i][j] = true;
    		// string A is empty, compare characters in C and B
            if(i==0 && c[j-1] == b[j-1]){
                Table[i][j] =  Table[i][j-1];
            }
            // string B is empty, compare characters in C and A
	        else if(j==0 && c[i-1] == a[i-1]){
                Table[i][j] =  Table[i-1][j];
            }
            // Both strings are not empty
            //1. If character of A matches with character of C
            // but not of B
            else if (a[i-1] == c[i+j-1] && b[j-1] != c[i+j-1]){
                Table[i][j] = Table[i-1][j];
            }
            //2. If character of B matches with character of C
            // but not of A
            else if (a[i-1] != c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i][j-1];
            }
            //1. If character of A matches with character of C
            // and character of B also matches with C
            else if (a[i-1] == c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i-1][j] || Table[i][j-1];
            }
        }
    }
    return Table[lenA][lenB];
}
 
int main(void) {
	// your code goes here
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Complexity of dynamic programming algorithm to find if a string is interleaved of two strings will be O(n2).

For more dynamic programming problems, please refer : Dynamic programming problems

Please share if there is something wrong or missing. We would love to hear what you have to say. If you are interested in contribute to website, please contact us.