Find first non repeating character in stream of characters

Non repeating character in stream of characters

There is a stream of characters, find first non repeating character at any given point of time.


If we have a predefined set of characters and we need to figure out first non repeating character, then it is very simple. Create a hash table keyed on character and as and when we encounter a character in given set, we increase the count. Once we have processed all characters in set, we can go through the set again and return the first character which has count of occurrence of character is 1.
However, this approach cannot work when we have a continuous stream of character. Now let’s look for some hints. We need to find the first non repeating character till a given point of time. If we we query at time t0 and next instance of character comes at t1 where t1> t0, we will return the character at t0. Once we have hit the second instance of the character, we will discard that character. But what will be the first non repeating character at t1?
So we need to store all the characters which are non repeating till t0 and when queried for first non repeating character, we will return the first character in the list. Got some hint? There is a pattern of characters coming in and leaving out and that is First come first out. Which data structure do we need to implement that? Yes, Queues. Done we finalized one thing : To get non repeating character at any time we need to implement queue and store each character which is seen only once in that queue.
Now how about remove a character when character occurs again. We need to remove that node from the queue. This node can be at any place in the queue. What is the best structure where to implement queue when we have to delete node from any place, from start, end or middle ? That is doubly linked list.  So till now we have figured out how to delete a character entry from queue and how to get the first non repeating character.
However, how to find the exact node to be deleted in queue implemented using doubly linked list. Either we traverse DLL each time, or we can store a mapping. Mapping will be between character and node pointer in DLL.
Now to check if this is character is seen two or more times, we can use a hash where we make true to character entry when we see character second time. So next time on wards we will not do any processing on that character.
Great! We got data structures. Now let’s summarize algorithm to find first non repeating character
  1. Take the input character. Check in ‘visited’ array if this character is seen more than two time.
  2. If yes, don’t do anything, process next character.
  3. If it not seen two times yet, then we need check if it is seen first time.
  4. If seen first time, DDL node entry corresponding to that character will be null. Add node in DLL and update the hash table.
  5. If character is already seen once, we will have a node entry in hash table. We will remove that node from queue and marked ‘visited’ entry as visited.
Let’s work out an example:
Stream of characters is : a,t,r,e,r,a,p,p,p,a,u,i,b,t,………….Continues
At first all our containers are false and NULL, queue is empty. 
Initial state

Character ‘a’ comes : Looking at visited array,we see that it is definitely not seen more than twice, else the place would have been marked true. Also looking at node map array we can see that this character is still not in queue, it means we have seen it first time. So we add this to queue and store pointer in node map array.
Data structure condition :
Now comes ‘t’, same above conditions are true, hence ‘t’ is also added to queue.
Similarly for r and e too.
first non repeating character in stream
When first instance of a character is seen

Twist comes when ‘r’ is seen again. This time we have a node pointer stored in node map array corresponding to ‘r’, but visited flag is false. This means we have seen ‘r’ second time. We till remove node ‘r’ from queue and mark visited[‘r’] as true, hence forth character will not considered as first non repeated character. So our data structures at this point look like this:

When second instance of character is seen
After this if query for first non repeated character, output will be ‘a’.
We process next character ‘a’. Again same fate as above ‘r’. We will remove node from the queue and marked visited[‘a’] as true.
After this if query for first non repeated character, output will be ‘t’.
This will continue is similar fashion. Please comment if further explanation is required on example.

Code to find first non repeating character

Complexity Analysis

Complexity of to find first non repeating character will be O(1) to find first non repeating character at a given point of time. We would use NO_OF_CHARACTERS extra space. 

Binary Search Tree : Vertical sum of nodes

Problem statement

Given a binary search tree, sum all the nodes which are at on the same vertical line. For example, in below tree nodes 6 and 5 are on same vertical line.

If below tree is given as input, output should be like :
Sum of vertical line 1 —> 2
Sum of vertical line 2 —> 3
Sum of vertical line 3 —> 11
Sum of vertical line 4 —> 7
Sum of vertical line 5 —> 9


If we consider root at vertical line 0, the all the nodes in the left side will be on the negative index. Every time we move towards left child, we decrease the index of vertical line and when we move towards the right child,we increase the index of vertical line.
Mechanism will be same as level order printing of tree, for each movement towards left or right, we will pass decreased or increase index of vertical line to the recursive call of the the function. Whenever a node is at the same vertical line it will be added to the sum already accrued till now on that vertical line.
Only hurdle in this implementation is how to implement negative indices. There are data structures like Hashmap available in high level languages which can solve this problem. However, in C we need to come up with some other mechanism. 
To make life simpler, we can have limit on number of vertical lines we can have in tree, lets call it HD_OFFSET. When we pass root to the function call, we will map 0 index to HD_OFFSE/2. Sum of all vertical lines left of root will be stored between 0 to HD_OFFSET/2 and sum of all vertical lines on right side of root will be stored between index HD_OFFSET/2 to HD_OFFSET.
Implementation is very simple if you have understood recursive level order printing mechanism.
Go through this post if you have doubt : Binary Search Tree : Level order printing


Complexity Analysis

Complexity of above code will be O(N) because we traverse all the nodes if the tree at least once.

Check if linked list is palindrome

Check if linked list is palindrome

Given a singly linked list, find if it is a palindrome or not. For example, 

Linked list in figure 1 is palindrome.

Palindrome linked list
Figure 1 : Palindrome linked list

Linked list in figure 2 is not a palindrome.

Figure 2 : Non palindrome linked list


By definition of palindrome, a linked list is palindrome if it reads forward and backwards same.
To check if string is palindrome, standard way is to keep to pointers, one moving from left to right and other from right to left and at each point we check if  character at this place match. If at any place they don’t match, we will say string is not palindrome and return. Once two pointers cross each other, we say string is palindrome.
Now can we apply that algorithm on linked list? No, because we cannot move backward that is from right to left in case of singly linked list. So what is the way?
One way is to reverse the entire linked list into a new linked list and scan both linked list simultaneously in forward direction. If they differ at any point of time, return false. However, this takes O(N) extra space.
Another way is to put half the list on the list. Now pop each element from the stack and move forward the linked list.If two elements are different at any time, linked list is not palindrome. We have reduced the space used to half.
Third way is to reverse the linked list till the middle node, and then starting from middle node, move in opposite direction comparing each node. If any of the node differ, return false. If we reach at head and tail of the list, then linked list is palindrome. Once done, reverse the half linked list again and restore the original list.
Let’s see the implementation of it.

Code to find if list is palindrome

Driver code for linked list creation. Replace the function call with appropriate function.

Complexity Analysis

Complexity of palindrome check algorithm for linked list is O(N).

Graphs : Topological sorting

Problem statement

Given a directed graph G (V,E), list all vertices such that for all edges (v,w), v is listed before w.
Simple example we can take is that of courses. Let’s say university provides four different courses, A, B, C and D. For taking B, A is prerequisite. Similarly for taking C, A and B are prerequisites. For D, C and B are prerequisite. If created a node for each course and have a directed edge from prerequisite course to other courses, we will get a graph to work with. 
Graph representing courses prerequisites
If somebody asks us to figure out in which order he/she must take these course, we just do the topological sorting of the graph, it will give us answer. Topological sort order will be A, B, C ,D


First we need to figure where to start. What do we understand from above example? 
Yes, we should take course first which has no prerequisites. If a course has no prerequisites, then there will be no incoming edges to that course. In graph terminology, node would have zero in-degree.
OK, we got the first node. Now how to proceed. Lets go back to our example again. If we have taken a course, what we can do is remove edges from that course to all other courses for which it was prerequisite. Why? Because if we have taken this course we are eligible to take all courses which have this course as prerequisite. Correction : If this course was only prerequisite.
So when we visit a node, what we do is we decrease the in-degree of all adjacent nodes.
Now again search for course, which has no prerequisite and process as above. In graph terminology, search for the node in update graph, which has zero in-degree and repeat above steps till we have no such node which has zero in-degree. If there are still some nodes which are not visited, it will be in case of cycles in graph,  topological sorting is not possible.
Let’s take an example and work it out.
Topological sort of a directed graph


Complexity of above code O(V^2 +E). For each vertex we need to find the vertex with zero in-degree, hence the quadratic time.
Let’s think of something which can make this update and search less complex.
Whenever we are updating the in-degree of all the adjacent node, we can store all the vertices for which in-degree becomes zero in a queue. So we don’t need to separately search for the node with zero in-degree, we can simply take front of the queue.


Queue implementation can be referred here (In breadth first traversal) : Graph Traversals

Driver program for above codes is Complexity of above code is O(V+E) which is linear, although it takes extra space.

Linked List : Delete a given node from linked list

Problem statement

Given a linked list and a node value, delete given node from linked list.


To delete a node from linked list, first we need to find the node. We compare given value with each node of list and when value matches, we have found the node. To delete node, we need to make next pointer of previous node to point to next node of current node. However, we don’t have any access to previous node of the linked list once we have moved forward.

So what shall we do? We should keep track of previous node while traversing the linked list forward.

Deletion of node

Now there are three cases we should take care of:
1. First when node to be deleted is head node.
2. Second when node to be deleted is any of the middle node.
3. Third when node to be deleted is last node.

In first case, we should also update head pointer of the linked list. For second and third case we can just change the pointers as explained above. (Next of previous points to next of current).


What if some one asks you not to use previous pointer. There is a very well known method for that too. Idea is to traverse till the node which is to be deleted. Then copy value of next  node onto the current node. Once done, change next pointer of current node to point to next of next node.

This method will not work perfectly if node to be deleted is the last node. In this case even if we avoid accessing the null node after last node, we may not be able touch previous node’s next pointer, which will point to a freed node once we delete last node.
Good point to make in interview.


Driver program to run above code is

Complexity Analysis

Complexity of both of above algorithms is O(N) where N is number of nodes in linked list.

Graphs : Dijkstra’s Algorithm to find shortest path

Problem Statement

Given a graph, directed or undirected and two nodes, find shortest path between these two nodes.


This is a standard problem and we don’t need to figure out what to do. We will have adjacency list representation of graph. 

Algorithm is widely published and is as below.

  1. Initialize distance of all nodes from start node as INFINITE and all nodes as not finalized.
  2. Take source node to start with, let’s say u.  Distance from source or start node to itself will be zero.
  3. Mark u as considered and distance finalized.
  4. Now for all neighbor nodes v of it, update the distance if the current distance is more than distance of  u + weight of (u,v). For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.{From Wikipedia}
  5. Now select a node for which distance is not finalized and distance is minimum till now and make it u. Go to step 3.
Let’s work out an example and see how it works. Input graph is
Input graph
Execution of above algorithm on this graph will be
To figure out the path which was followed to reach destination from source, we can have an array to keep track of the parent node whenever distance to the node is updated. By reverse tracking parents from destination to source, we can figure out the path.


Complexity Analysis

Complexity of above code is O(V^2) where V is number of vertices in graph. This can be reduced to O(E log V) by using heaps. Heaps will reduce complexity of searching minimum weight cost from O(V) to O(log V). 

Limitation of algorithm
1. It does not work with negative weights.

Graphs : Word ladder or word golf

Problem statement

Let me explain problem in detail first. We are given two words A and B and a dictionary of words. Task is to transform first word into second word by changing one character at a time. Character should be changed in such a way that the new word should be in dictionary. Find minimum number of such hops from one word to reach the second word.
For example, lets say given word are  CAT and DOG
Word transition
Sequence of steps will be CAT –> COT–>COG–>DOG, hence three substitutions and we are done.
To convert COLD to WARM (from Wikipedia)
Sequence of steps will be COLD–>CORD–>CARD–>WARD–>WARM.


There are two strings/words given as input. And hoping can be done only if there is one condition : the destination of hop should differ at most one character with source. So if two words differ only in one character, we have a relationship between them.
So if take those two words as two nodes and connect them only if they differ by one character,  we get a graph with two nodes and one edge.
Apply same principle to all words in dictionary and create a graph, nodes representing words and edges the relationship.
Once we have created graph as per above conditions, start breadth first traversing from the node which has values as first word and go on scanning till we get a node with value as second word.
That’s all. Simple analysis! Implementation is bit tricky. Let’s look at implementation.
For each word, if we go an scan every word in dictionary to figure out if they differ by one character in order to create edge will be too inefficient with complexity of O(N^2).
If we create buckets of words with one character of that replaced with _ and put all similar words into that bucket, we will segregate words with one character different. Let’s take an example.
Dictionary given is  {“CAT”, “BAT”, “COT”, “COG”, “COW”, “RAT”, “BUT”,”CUT”, “DOG”, “WEB”}
Now for word CAT, there are three buckets possible.
Now take word BAT, if we replace B with _ it becomes _AT, for which we already have a bucket, so we can put CAT and BAT in same bucket.
Similarly when we replace second character of COT, it becomes C_T, for which we already have a bucket, hence CAT and COT are placed in once bucket.
With above given words buckets will be like below.
Now connect all words in same buckets with each other.
Final graph will be like below:


This above code needs optimization, please provide suggestions to improve.
Breadth First Search implementation is as follows