Longest Substring Without Repeating Characters

Longest Substring Without Repeating Characters

Given a string, find longest substring without repeating characters in it. For example, S = “abcaabaca”, longest substring without repeating characters will be “abc”

Brute force solution will be to scan all substrings of given string and check which one has longest length and no repeating characters. For a string with size n, there will be n * (n-1) substrings, and to check it each for unique characters, it will take n comparison in worst case. So, worst case complexity of this algorithm is O(n3) with additional space of O(n). Code is simple enough.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
unsigned int checkIfAllUniqueCharacters(char *s, int start, int end) {
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
	for (int i = start; i < end; i++) {
	    char ch = s[i];
	    if (table[ch-'a']) return false;
 
            table[ch-'a'] = true;
    }
    return true;
}
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	int maxLength = 0;
 
	for (int i = 0; i < n; i++){
	    for (int j = i + 1; j <= n; j++){
		int length = j-i;
		if (checkIfAllUniqueCharacters(s, i, j)){
			maxLength = MAX(maxLength, length);
		}
	    }
	}
	return maxLength;
}
 
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
		longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Longest Substring Without Repeating Characters: Sliding window approach

A sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in the array/string which usually defined by the start and end indices. A sliding window is a window “slides” its two boundaries to the certain direction.

In brute force approach, we repeatedly checked each substring for unique characters. Do we need to check each substring? If a substring s[i,j-1] contains non repeating characters, while adding jth character, check if that character is already present in substring s[i,j]. Since we are scanning substring to ascertain uniqueness of new character being added, complexity of this algorithm is O(n2).

How about optimizing the scanning part? What if we use a hash to store characters which are already seen in substring s[i,j], then checking uniqueness is done in O(1) and hence overall algorithm complexity becomes linear. Here is how to use hash to find solution.

Store characters of substring[i,j] (sliding window) in hash. When new character is processed, check if that character is present in hash,If character is not present in hash, add it to hash and increment j. If character already present in substring s[i,j], that means, it cannot be added to longest substring. Find length of substring (j-i) and compare it with current max, if it is greater, max length of longest substring without repeating characters is (j-i). Also, window also moves from i to i+1 and check for longest substring from i+1 index.

Longest substring without repeating characters implementation

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
char * substr(char *s, int start, int end){
	char *dest =  (char *)malloc(end-start+1);
	int k = 0;
	int i = start;
	while(i < end){
	    dest[k++] = s[i++];
	}
	return dest;
}
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	while (i < n && j < n) {
		if (!table[s[j] -'a']){
			table[s[j++] -'a'] = true;
			int length = j-i;
			maxLength = MAX(maxLength, length);
		}
		else {
			table[s[i++] -'a'] = false;
		}
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

As mentioned above, complexity of this method is O(n), with additional space complexity of O(n).

Below is example execution of above code.

Current Character : a
Substring (  ) does not contain a
New length of substring without repeating character : 1
Current Character : b
Substring ( a ) does not contain b
New length of substring without repeating character : 2

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : a
Substring (abc) contains a
Advancing i to 1

Current Character : a
Substring ( bc ) does not contain a
New length of substring without repeating character : 3

Current Character : b
Substring (bca) contains b
Advancing i to 2

Current Character : b
Substring ( ca ) does not contain b
New length of substring without repeating character : 3

Current Character : c
Substring (cab) contains c
Advancing i to 3

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : b
Substring (abc) contains b
Advancing i to 4

Current Character : b
Substring (bc) contains b
Advancing i to 5

Current Character : b
Substring ( c ) does not contain b
New length of substring without repeating character : 3

Current Character : b
Substring (cb) contains b
Advancing i to 6

Current Character : b
Substring (b) contains b
Advancing i to 7

Current Character : b
Substring (  ) does not contain b
New length of substring without repeating character : 3

Longest substring without repeating characters : 3

There is a small optimization which helps us to skip more characters when repeating character is found instead skipping one at a time. Store the index of each character seen in substring [i,j-1].  While processing jth character, if it is already in hash, we know the index j’ where that character is in string. There is no way that any substring can contain unique characters till j’ and j are in it. So, we skip all indices from i to j’ and start from j’+1 instead of i+1 as in above method.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	for (int j = 0, i = 0; j < n; j++) {
		if (table[s[j]-'a']) {
			int currentIndex = table[s[j]-'a'];
			i = MAX(currentIndex, i);
		}
		int currentLength = j - i + 1;
		maxLength = MAX(maxLength, currentLength);
            table[s[j]-'a'] = j + 1;
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Complexity of find longest substring without repeating character is hence O(n) with additional space complexity of O(n).

Please share if something is wrong or missing. We would love to hear from you.