Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n^{2}substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n^{2}m).

Can we do better than that?

# Longest common substring using dynamic programming

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Let’s create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j].

As in case of longest common subsequence, we will start with smaller case first. What if one of the string is empty? If one of the string is empty then, LCS = 0. Hence, LCS[i][0] = 0 and LCS[0][j] = 0.

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 1.1 If yes,LCS[i][j] = 1 + LCS[i-1][j-1]( Because new character is added to already common substring, if any, till A[0...i-1] and B[0,,j-1]) 1.2 if both characters are not same,LCS[i][j] = 0,( Because if characters are not same, there cannot be any common substring including A[i] and B[j]. 2. Traverse the matrix and find the maximum element in it, that will be the length of Longest Common Substring. (This step can be optimized by keeping track of max length while calculating LCS[i][j]).

Implementation

#include <stdio.h> #include <string.h> int max(int a, int b){ return a>b ? a:b; } int longestCommonSubstring(char * A, char * B){ int lenA = strlen(A); int lenB = strlen(B); int LCS[lenA+1][lenB+1]; for (int i=0; i <= lenA; i++){ LCS[i][0] = 0; } for (int j=0; j <= lenB; j++){ LCS[0][j] = 0; } int maxLength = 0; for (int i=1; i<= lenA; i++){ for (int j=1; j <= lenB; j++){ if (A[i] == B[j]){ LCS[i][j] = 1 + LCS[i-1][j-1]; maxLength = max( maxLength, LCS[i][j] ); } else { LCS[i][j] = 0; } } } return maxLength; } int main(void) { char *a = "ABCDEFGSE"; char *b = "EBCDEFGV"; printf("\n Longest common substring : %d", longestCommonSubstring(a,b)); return 0; }

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.