HashMaps and Fibonacci

HashMaps and Fibonacci

This time, we do the simplest of all algorithms i.e. Fibonacci series. This is a simple algorithm but depends heavily on the implementation. We all know the loop approach so we skip it and instead do a recursive implementation in c++. And we use hashmaps to store the intermediate results so we don’t end up with O(2^n) time complexity and keep it at around O(n). Of course the space complexity, in this case, would be O(n) as well since we are storing the intermediate results.

So without further ado, below is the code is as below.

#include <map>
#include <stdexcept>      // std::out_of_range
#include <ctime>
#include <iostream>
 
namespace Fibonacci
{
class Fibonacci
{
    public:
        Fibonacci();
        virtual ~Fibonacci();
        unsigned long long calcFibonacci(unsigned int number);
    private:
        std::map<int, unsigned long long> *m_FibMap;
};
 
Fibonacci::Fibonacci()
{
    m_FibMap = new std::map<int, unsigned long long>();
}
 
Fibonacci::~Fibonacci()
{
    delete m_FibMap;
}
 
unsigned long long Fibonacci::calcFibonacci(unsigned int number)
{
    unsigned long long result{};
    if(number == 0 || number == 1)
    {
        (*m_FibMap)[0] = 0; (*m_FibMap)[1] = 1;
        m_FibMap->at(number);
    }
    if(m_FibMap->find(number) != m_FibMap->end())
    {
        result = m_FibMap->at(number);
    }
    else
    {
        (*m_FibMap)[number] = calcFibonacci(number - 1) + calcFibonacci(number - 2);
        result = m_FibMap->at(number);
    }
    return result;
}
}
 
int main()
{
    int number{1000};
    Fibonacci::Fibonacci fib;
 
    int ticks=clock();
    std::cout << std::endl << "Fibonacci for " << number << " is " << fib.calcFibonacci(number) << std::endl;
    ticks=clock() - ticks;
    std::cout << std::endl << "Execution time for Fib Calculation for " << number << " is " << ticks
        << " ticks i.e. " << (((float)ticks)/CLOCKS_PER_SEC) << "seconds" << std::endl;
}

The code is very easy to understand. Basically, we are storing the results of already calculated values in a map which are reused for further calculations. Hence we get less runtime complexity with a bit more space complexity. I have also added code to calculate the number of clock ticks it takes to run the code. Also, a number of seconds elapsed can be seen.

The code will start showing its awesomeness once you start using the same instant of the class for multiple Fibonacci calculations. The tick count will go down drastically since all the numbers (present in the lower range than the previous one) will already be mapped in the hashmap. Then it would be a simple lookup in the hashmap. Newer numbers if larger than the previous calculations will be calculated but that will still happen quickly.

Points to keep in mind

Also we have a limit on ulonglong so don’t forget about it. If you really want to get larger digits, I suggest BigInt libraries but I don’t want to put them all in ideone. That is a side project for your local machines. I will soon open source a repository of all these code samples on gitlab.com which can be accessed and contributed too as needed.

Visit me at: https://www.naresh.se/

Follow me on Twitter: @wolverine2k

Program to check if a string is a Pangram

What is a Pangram?

A pangram is a sentence containing every letter of the alphabets. Our job today is to try to find if the given string is a pangram or not. This will be an O(n) time and O(2n) space complexity problem with n being the length of the string. Reason being that we need to traverse the string all the way to understand if all the alphabets are in the string or not. One can add an exit condition where if the pangram is found, we ignore the rest of the string but again, that means we put in additional checks after each find of an alphabet. Not worth the extra computations required except if we are talking of very big strings. Lets try to tackle that in part 2.

Our simple code converts all the characters in a given string to lower case first. We use the std::transform function in <algorithms> to do that. We also have initially declared a map of all the alphabets with value set to false. Now our task is pretty simple. We iterate through the string and mark the value of the alphabet in the map (i.e. change it to true) if that alphabet is encountered in the map. Accessing map elements is pretty fast with a runtime complexity of O(1). Similarly, accessing array elements in string is O(1). We need to iterate over the whole string. Lastly, we iterate over our map and see if all the elements are set to true. If yes, the string is a pangram otherwise it is not.

Code to find if sentence is pangram or not?

Remember that we skip constants in calculating space and time complexity. Time complexity in this case is O(n) and space complexity is O(2N).

Visit me at: https://www.naresh.se/
Follow me on Twitter: @wolverine2k

This post is contributed by Naresh, please contact us if you are willing to contribute to algorithms and me.

Reverse a string O(n/2) time, O(n) space

Reverse a string

We all know how to reverse a string right? But the quest to always find something better than existing algorithms makes us quite excited. But before jumping into the code, let’s try and see the best case i.e.(Ω) for this problem. It will allow us to find out the best possible time we can have to solve the problem. Nothing else faster than that can be achieved.

Algorithm to reverse a string

1. Iterate from beginning of string to n/2 
2. Swap the elements boundary elements 
3. If string is of even length, continue till n/2 and swap that one 
4. Else we let the middle element remain as it is (i.e. continue to (n-1)/2)

This algorithm to reverse a string gives best possible runtime of Ω(n/2). Can we do any better? Looks like not, since we will need to traverse the string halfway anyways. So, if we implement an algorithm to have a Big-O of n/2 as well, then our Θ i.e. average time will also be n/2. That’s a relief, we will have a linear time complexity growth. Our space complexity will always be O(n) since we will not allocate any space and do in-place replacements in the string. I have not been able to find anything else faster than this one. But write in the comments if there is something faster. I am thinking for maybe dynamic programming or multi-threading but we address that in the next post.

Let’s see if I can explain it better. Suppose the string is “ABCDE”. Our loop will iterate 2 times (5/4). In the first iteration, A will be swapped with E. And the 2nd iteration B will be swapped with D. E will remain at its place. So the reversed string will be “EDCBA”. Looks like that is the one we want. Try similar with an even string and it will still be the same.

Now the code! I am using c++ and will use c++14 where possible. Of course we can use the std::reverse as well BUT that one is a time complexity O(n). Check the logic here. The code for our logic is as below:

Cheers,
Naresh

visit me at: https://www.naresh.se/
Follow me on twitter: @wolverine2k

Thanks Naresh for contributing to Algorithms and Me, if you want to contribute, please contact us and refer publishing at Algorithms and Me.

Longest common substring -Dynamic programming

Longest common substring

Given two string A and B, find longest common substring in them.  For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2 substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m). Can we do better than that?

Longest common substring using dynamic programming

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Let’s create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j].

As in case of longest common subsequence, we will start with smaller case first. What if one of the string is empty?  If one of the string is empty than, LCS = 0. Hence, LCS[i][0] =0 and LCS[0][j] = 0.

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
       ( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j]. 
2. Traverse the matrix and find the maximum element in it, that will be the length of Longest Common Substring.
(This step can be optimized by keeping track of max length while calculating LCS[i][j]).

Implementation

Table filled by above code is as

Longest common substring using dynamic programming

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

In next post we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

Please share if you find something wrong or missing. If you want to contribute to site, please refer : Publish on Algorithms and Me and contact us. We would be happy to publish your work and in turn will pay you too.

Longest Substring Without Repeating Characters

Longest Substring Without Repeating Characters

Given a string, find longest substring without repeating characters in it. For example, S = “abcaabaca”, longest substring without repeating characters will be “abc”

Brute force solution will be to scan all substrings of given string and check which one has longest length and no repeating characters. For a string with size n, there will be n * (n-1) substrings, and to check it each for unique characters, it will take n comparison in worst case. So, worst case complexity of this algorithm is O(n3) with additional space of O(n). Code is simple enough.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
unsigned int checkIfAllUniqueCharacters(char *s, int start, int end) {
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
	for (int i = start; i < end; i++) {
	    char ch = s[i];
	    if (table[ch-'a']) return false;
 
            table[ch-'a'] = true;
    }
    return true;
}
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	int maxLength = 0;
 
	for (int i = 0; i < n; i++){
	    for (int j = i + 1; j <= n; j++){
		int length = j-i;
		if (checkIfAllUniqueCharacters(s, i, j)){
			maxLength = MAX(maxLength, length);
		}
	    }
	}
	return maxLength;
}
 
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
		longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Longest Substring Without Repeating Characters: Sliding window approach

A sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in the array/string which usually defined by the start and end indices. A sliding window is a window “slides” its two boundaries to the certain direction.

In brute force approach, we repeatedly checked each substring for unique characters. Do we need to check each substring? If a substring s[i,j-1] contains non repeating characters, while adding jth character, check if that character is already present in substring s[i,j]. Since we are scanning substring to ascertain uniqueness of new character being added, complexity of this algorithm is O(n2).

How about optimizing the scanning part? What if we use a hash to store characters which are already seen in substring s[i,j], then checking uniqueness is done in O(1) and hence overall algorithm complexity becomes linear. Here is how to use hash to find solution.

Store characters of substring[i,j] (sliding window) in hash. When new character is processed, check if that character is present in hash,If character is not present in hash, add it to hash and increment j. If character already present in substring s[i,j], that means, it cannot be added to longest substring. Find length of substring (j-i) and compare it with current max, if it is greater, max length of longest substring without repeating characters is (j-i). Also, window also moves from i to i+1 and check for longest substring from i+1 index.

Longest substring without repeating characters implementation

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
char * substr(char *s, int start, int end){
	char *dest =  (char *)malloc(end-start+1);
	int k = 0;
	int i = start;
	while(i < end){
	    dest[k++] = s[i++];
	}
	return dest;
}
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	while (i < n && j < n) {
		if (!table[s[j] -'a']){
			table[s[j++] -'a'] = true;
			int length = j-i;
			maxLength = MAX(maxLength, length);
		}
		else {
			table[s[i++] -'a'] = false;
		}
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

As mentioned above, complexity of this method is O(n), with additional space complexity of O(n).

Below is example execution of above code.

Current Character : a
Substring (  ) does not contain a
New length of substring without repeating character : 1
Current Character : b
Substring ( a ) does not contain b
New length of substring without repeating character : 2

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : a
Substring (abc) contains a
Advancing i to 1

Current Character : a
Substring ( bc ) does not contain a
New length of substring without repeating character : 3

Current Character : b
Substring (bca) contains b
Advancing i to 2

Current Character : b
Substring ( ca ) does not contain b
New length of substring without repeating character : 3

Current Character : c
Substring (cab) contains c
Advancing i to 3

Current Character : c
Substring ( ab ) does not contain c
New length of substring without repeating character : 3

Current Character : b
Substring (abc) contains b
Advancing i to 4

Current Character : b
Substring (bc) contains b
Advancing i to 5

Current Character : b
Substring ( c ) does not contain b
New length of substring without repeating character : 3

Current Character : b
Substring (cb) contains b
Advancing i to 6

Current Character : b
Substring (b) contains b
Advancing i to 7

Current Character : b
Substring (  ) does not contain b
New length of substring without repeating character : 3

Longest substring without repeating characters : 3

There is a small optimization which helps us to skip more characters when repeating character is found instead skipping one at a time. Store the index of each character seen in substring [i,j-1].  While processing jth character, if it is already in hash, we know the index j’ where that character is in string. There is no way that any substring can contain unique characters till j’ and j are in it. So, we skip all indices from i to j’ and start from j’+1 instead of i+1 as in above method.

#include <stdio.h>
#include <string.h>
 
#define true 1
#define false 0
 
#define MAX(a, b) a > b ? a:b
 
int longestSubstringWithoutRepeatingCharacters(char *s) {
	int n = strlen(s);
	unsigned int table[255];
 
	for(int i=0; i<255; i++){
		table[i] = false;
	}
 
	int maxLength = 0, i = 0, j = 0;
	for (int j = 0, i = 0; j < n; j++) {
		if (table[s[j]-'a']) {
			int currentIndex = table[s[j]-'a'];
			i = MAX(currentIndex, i);
		}
		int currentLength = j - i + 1;
		maxLength = MAX(maxLength, currentLength);
            table[s[j]-'a'] = j + 1;
	}
	return maxLength;
}
 
int main(void) {
 
	char *s ="abcabcbb";
	printf("Longest substring without repeating characters : %d", 
			longestSubstringWithoutRepeatingCharacters(s));
 
	return 0;
}

Complexity of find longest substring without repeating character is hence O(n) with additional space complexity of O(n).

Please share if something is wrong or missing. We would love to hear from you.

Lowest common ancestor with Range Minimum Query

Lowest common ancestor with Range Minimum Query

We already have discussed least common ancestor and range minimum query. In this post, we will discuss how to use RMQ to find Least common ancestor of two given nodes.  This is post will concentrate on finding Lowest common ancestor with Range Minimum Query. LCA of (u,v) is the node which is furthest from root and u and v are descendent of it. For example, LCA of (5,9) is 2.

LCA using RMQ

In earlier solutions, we were scanning tree each time a query if fired to find LCA of two nodes which has complexity of O(n). If this query if fired frequently, this operation may become bottleneck. One way to avoid processing all nodes when LCA is asked, is to preprocess tree and store that information in such a way that LCA can be found instantly, say in O(1).

This pattern is very similar to Range Query Minimum algorithm. Can we reduce lowest common ancestor problem to range query minimum problem?

Reduction of LCA to RMQ

Let’s revise what is RMQ : Given array A of length n.  RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j].  Suppose an algorithm has:  Preprocessing time –  Query time –  Notation for the overall complexity of an algorithm: <f(n), g(n) >.

Let’s find LCA of two nodes 5 and 8 manually in above graph. We notice that LCA(u,v) is shallowest common node (in terms of distance from root) which is visited when u and v are visited using depth first search of tree.  Important thing to note is that we are interested in shallowest, which is minimum depth, node between u and v. Sounds like RMQ?

Implementation wise, tree is traversed as Euler tour. At most there can be 2n-1 nodes in Euler tour of tree, store this tour in an array of  2n-1. Then we require depth of each node while doing Euler tour, so we store this information in another arrays of size 2n-1. Any depth would do as far as consistency is maintained, but we will store depth when node was visited first.

1. Euler Tour [1,..,2n-1] – The nodes visited in an Euler tour of T. Euler[i] is the label of the ith node visited in the tour. 
2. Depth[1,..2n-1] – The level of the nodes tour. Level[i] is the level of node Euler[i]. (level is defined to be the distance from the root).
3. FirstVisited[1,..n] – FirstVisited[i] will hold value when node is first visited.

For example of this computation

lowest common ancestor using range minimum query example

To compute LCA(u,v):  All nodes in the Euler tour between the first visits to u and v are E[FV[u],..,FV[v]] (assume FV[u] < FV[v]). 

The shallowest node in this sub-tour is at index RMQ D(FV[u],FV[v]), since D[i] stores the depth of node at E[i]. 

RMQ will return index of shallowest node between u and v, thus output node will be E[RMQ D(FV[u],FV[v])] as LCA(u,v)

lca using rmq example

lowest common ancestor using range minimum query implementation

Beauty of this algorithm to find lca using rmq is that it can be used to find LCA of any tree not just only binary tree or BST. Complexity of algorithm to find lowest common ancestor using range minimum query is <O(n), O(1)> with additional space complexity of O(n).

Please share if there is something wrong or missing, we would love to hear from you. If you are interest in contributing to algorithms and me,please refer to publisher page.

Find number of connections of person till nth level

This is typical social network problem, where one has to find all friends of a person who are at max n distance away from person. In other words, friends of person are at 1st level, friends of friends of p are at 2nd level and so on. This question was asked recently in Microsoft interview.

For example, in below graph, if we have to find connections of node 1 till 3rd level, result will :

connections till nth level

Level 1 : 2,3,4 (friends)
Level 2 : 5,6,7,8 (friends of friends)
Level 3 : 9,10,11

Find friends of a person till Kth level

The standard way to find connections or friends in graphs is to do breadth first search on graph. Start BSF from the person given, till you have visited nodes with max k distance from starting node. Only extra thing needs to be done other than standard breadth first search is to keep track of number of levels we have moved.

Implementation is quite simple.

Complexity of algorithm to find friends of person till nth level is O( |V| + |E|) where V are number of vertices and E is number of edges in graph representation of person’s social network.

Breadth First Search (BFS) on a graph

In last post Depth first search, we discussed recursive and iterative way to traverse a graph depth first. In this post, we are going to learn another traversal method called Breadth First Search (BFS).

Breadth First Search

First of all, what is breadth first search? In BFS, all neighbors of a node are visited before any of neighbor of those neighbors. Basically, we traverse graph layer by layer. For example, BFS of below graph would be 1->2->6->3->5->4

breadth first search example

In depth first search, we go deep into the graph till there is not further move possible and then backtrack and again do the same process with another neighbor of parent and so on till we don’t have any further node to visit.

In breadth first search, as we traverse layer by layer, there is no backtracking required. Before reaching to layer i, all the nodes till layer   i-1 would have been already visited.

Start from node S. Let S be node u. Mark it as visited. Visit all nodes which are at one edge distance from u. Once all nodes are visited, make each neighbor node as u one by one and repeat above steps till the time there is some unexplored edges. Below figure explains step by step how BFS done.

breadth first search example

Implementation wise, BFS uses queue unlike DFS which uses stack. While processing node, all neighbors of that node are enqueued in queue and processed one after the other in FIFO order. This ensures that all nodes at same level are processed before next level is started.

Breadth-First-Search(Graph,root):
  for each node u in Graph:
    create empty queue Q
    Q.enqueue(root)
    while Q is not empty:
       current=Q.dequeue()
       print current->value
       visited[v] = true
       for each v that is adjacent to current and not visited
           Q.enqueue(v)

Below figure explains how breadth first implementation works

breadth first search implementation c

Breadth first search implementation

Applications of BFS
1. To find shortest path between two nodes u and v in an unweighted graph.
2. In Peer to Peer Networks like BitTorrent, Breadth First Search is used to find all neighbor nodes.
3. Search engine crawlers, each page leads to more pages linked to current page and so on.
4. Social networks, BFS is used to find degree of separation, friends within K distance etc.
6. Broadcasting protocol in networks uses BFS to broadcast packets to all neighboring devices.
7. To test bipartite-ness of a graph
8. To find all nodes within one connected component of graph.
9. Search for connected components in the graph for O(E+V). To do this, run start BFS from each vertex, except for vertices visited in previous runs. Do not reset each time the array visited[], due to which every time we get a new connected component, and the total running time will be still O(E+V).
10. Finding solution to problem defined by statement with the least number of moves,  each state of game is represented by a vertex of graph, and the transitions from one state to the other represented as edges of graph.

Finding the shortest cycle in a directed unweighted graph: produce a breadth-first search from each vertex; as soon as we try to bypass the process to go from the current peaks on some edge to an already visited vertex, then it means that we have found the shortest cycle, and stop bypassing wide found among all such cycles (one from each startup bypass) choose the shortest.

Complexity of Breadth First Search is O(|V| +|E|) where V is number of vertices in graph and E is number of edges in graph.

Please share if something is wrong or missing. If you are interested in contributing to website and share you knowledge, please refer to Publishing on Algorithms and Me and contact us. We would love to publish you articles and pay you too.

Degree of separation using graphs

Degree of separation

We all are present on one or other social network like Facebook, LinkedIn, Google+ etc. On these there are friends of ours, then friends of friends and then friends of friends. Connections between people are represented as graphs where persons are nodes and if two persons are connected, then there will be an edge between them. Number of edges between two nodes is called as degree of separation.

degree of separation

What is degree of separation?

“Degrees of separation” is a concept that refers to how many people it takes to connect you and another person. Implementation wise, it is minimum number of edges between two persons. Simply understood, It’s the shortest path between two persons where each edge has weight of 1. Foe example, in above graph, degree of separation between Sarah and Alice is 2, as Sarah is connected to Ivana and Ivana in turn is connected to Alice.

Find degree of separation between two persons

First of all, we will store information of persons and their connections in graph. To find degree of separation between persons, breadth first traversal of graph  is used.

Start from one person and check it’s all neighbors. If one of neighbor is second person, then degree of separation is 1. If not, increment degree by 1 and then look in neighbors of each neighbor in last step. This goes on till either we find second person or there is no node to be visited in graph (that means other person does not exist in our network ).

There is small optimization to above algorithm, that is to start breadth first traversal from both the nodes at same time. While traversing levels, check if neighbors of one person contains other person.

1. Given person1 = (A); person2 = (B); Init degree = 1
2. While there are nodes to be traversed
   2.1 Find all neighbors of person 1 
         neighborsOfPerson1 = neighbors(person1)
   2.2 If neighborsOfPerson1 contains person2 return degree
   2.3 else increment degree
   2.4 Find all neighbors of person2 = neighbors(person2)
         neighborsOfPerson2 = neighbors(person2)      
   2.5 If neighborsOfPerson2 contains person1 return degree
   2.6 else increment degree

Simplistic implementation would be to use adjacency matrix to store neighbor information. To understand how a graph can be represented as adjacency matrix, please refer to post : Graphs basics However, adjacency matrix wastes lot of space, so it is good to represent information in graph built based on adjacency list.

In implementation, a sentinel is inserted every time a level is finished and new level has to be searched. At same time, we increment the degree too.

Complexity of this algorithm to find degree of separation between two persons is O(N) as we have to scan all nodes in words case. Also, there is space complexity involved of order O(N).

After from this Breadth first traversal,  Dijkstra’s shortest path algorithm between two nodes can be used . Below is implementation using Dijkstra’s algorithm to determine degree of separation.

Please share if something is wrong or missing, we would love to hear from you.

Range minimum query (RMQ)

What is Range minimum query?

Sometimes we are asked to find index of minimum element within range of an array, this operation is called as range minimum query (RMQ). For example, if given array A = [2,3,4,6,8,1,5,9], ask is to find index of minimum value between index 2 and 7, returned answer would be 5.

range minimum query

Going by the brute force, every time a query if fired, we scan the range and find the minimum in given range in same way as we do for entire array. Complexity of each query being answered is O(N) where in worst case range is  entire array.

Can we preprocess our data, so that our query operations are less costly? If we do so, there are two parts to the solution now, first, preprocessing and second query. Let’s assume complexity of each step is f(n) and g(n) respectively, then complexity of solution can be denoted as ( f(n), g(n)).

What kind of preprocessing can be done? Basic idea is to calculate minimum index of all the ranges possible in array. How many ranges are possible for an array with N elements? It’s N X N ranges. Why?

So, to store minimum index of each range, O(N2) order space is required and time complexity goes to O(N3). However, complexity of query is O(1). So overall complexity of solution is ( O(N3), O(1) ).

With application of dynamic programming, complexity of preprocessing step can be reduced to O(N2).

Can we do better for preprocessing step while trading off query step? If we divide array into smaller chunks and store minimum element index in those chunks, will it help? And what should be size of chunk? Let’s divide array in √n parts, where n is size of part.

Dividing into square root parts rmq

Now, find minimum element index in each of this chunk, and store it. Extra space required is (√n). Finding minimum for each chunk has complexity of (√n * √n) as O(n).

To find minimum element index in given range, follow three steps:

1. Find minimum index of all chunks lying between start and end of given range. max √n operations
2. Find minimum index in chunk where start of range lies max √n comparisons from start of range to end of chunk.
3. Find minimum index in chuck where end of range lies from start of chunk to end of range.
4. Compare these three values and return minimum

No matter, how big or small range is to find minimum index, worst case will be O(√n) as there are only 3*√n operations.

Let’s take an example and see how it works. Find minimum in range (2,7)

range minimum query example

To get RMQ(2,7) in the array, what are the chunks with are lying with in range, it’s just one, chunk 1. , minimum index of chunk 1 is M[1] = 5. Now find minimum index in chunk 0 where start of range lies (starting from start of range which 2). It’s index 2.  Last find minimum from start of chunk where end of range lies to end of range. It’s index 6.

At end compare A[5] (minimum of all chunks between start and end of range ), A[2] (minimum in chunk where start of range lies) and A[6] (minimum in chunk where end of range lies) and we have the answer as 5 as it minimum of three.

Aggregating all thing, we found a way to optimize solution of range minimum query with complexity as ((o(n), O(√n)).

Range minimum query using sparse table

Method 3 uses only O(√n) space however query time complexity is also O(√n). To reduce query time at the expense of space, there is another method called as sparse table method. This method uses features of method 2 (dynamic programming) and feature 3 (find minimums of chunks).

In this approach, we split inout array into chunks of size 2j where j varies from 0 to log n and n is number of elements in array. There will be n log n such chunks and hence the space complexity becomes O(n log n).

spare table implementation of RMQ

After splitting, find minimum in each chunk and store it into look up table. M[i][j] stores minimum in range from i  with size 2j. For example, M[0][3] store minimum index between 0 and 7 (8 elements). Now problem is how to create this lookup table? This table can be created using dynamic programming from bottom up. Specifically, we find minimum in a block of size 2j by comparing the two minima of its two constituent blocks of size 2j-1. More formally,

M[i,j] = M[i, j -1] if A[M[i, j -1]] <= A[M[i+2j-1, j -1]] 
M[i, j] = M[i+2j-1, j -1] otherwise.

How to find minimum index in given range now? Idea is to find two sub ranges which cover the entire range and then find minimum of minimum of these two ranges. For example, find RMQ(i,j). If 2k be size of largest block that fits into the range from i to j, then k = log(j – i + 1).
Now, we have two parts to look in from i to i+2k + 1 (already computed as M[i,k] ) and from j-2k+1 (already computed as M[j-2k+1, k],).
Formally,

    RMQ(i,j) =  M[i][k] if A[ M[i][k] ] <= A[M[j-2k+1, k]]
    RMQ(i,j) =  M[j-2k+1, k]

These two block entirely cover the range and since only once comparison required, complexity of lookup will be O(1).

In this post we discussed various ways to implement range minimum query based on space and time complexity trade off. In future posts, we will discuss about applications of RMQ such as segmented trees and least common ancestor problem.

Please share if something is wrong or missing, we would love to hear from you.