Non repeating character in stream of characters
In post here
, we learned how to find first non repeating character in a string. In this case, entire set of character was available to us up front. Now, let’s see a problem where there is a stream of characters and not a string. So, the problem statement is something like this : Given a stream of characters, find first non repeating character
at any given point of time. For example stream is like a,b,c,c,c,b,a,d,e,e,f,…. First non repeating character in this stream till now is d. So, we have to return first non repeating character at a given point of time.
If there is predefined set of characters and task is to figure out first non repeating character, then it is very simple. Create a hash table keyed on character and as and when character is encountered in given set, increase the count. Once all characters in set are processed, go through the set again and return the first character which occurs once in set.
However, this approach cannot work when there is continuous stream of characters. We need a different approach. First of all, keep in mind that we have find first non repeating character a given point of time. If we query at time t0 and next instance of character comes at t1 where t1> t0, we will return the character at t0. When character is seen second time, it will be discarded. But what will be first non repeating character at t1 then?
Figure out data structures and behavior
In order to answer above question, store all the characters which are non repeating till t0 and when queried for first non repeating character, return first character in the list. Got some hint?
There is a pattern of characters coming in and leaving out and that is First come first out. Which data structure provides that? Yes, Queues. We have finalized one thing : To get non repeating character at any time we need to implement queue and store each character which is seen only once in that queue.
Second part is to remove a character when it occurs again. As characters seen only once till given point of time are stored in queue, we need to remove node from queue. This node can be at anywhere in queue. What is the best structure where to implement queue when we have to delete node at anywhere, from start, end or middle ? Doubly linked list. Till now we have figured out how to delete a node or character entry from queue when it occurs second time and how to get first non repeating character.
However, how to find the exact node to be deleted in queue implemented using doubly linked list. Either we traverse DLL each time with O(n) complexity, or we can store a mapping. Mapping will be between character and node pointer in DLL. Given a node pointer in DLL, how to delete it can be read here :Delete a node from doubly linked list
There is a small catch here : What if the character is seen third time ? We go to the hash <character, node> and get the node. Then try to free the node again which is already freed when it occurred second time. In this case, segmentation fault for sure! 🙂
One solution is to remove node from the hash. But then there is no way if the character never appeared or appeared twice.
To check if character is seen two or more times, use another hash with character as key and boolean as value. Set hash <character> to true when character is seen second time. So, first check in this hash, if the character entry in map is true, do not do any processing.
Great! We have got data structures. Now let’s summarize algorithm to find first non repeating character in stream of characters
1. Take the input character.
2. Check in 'visited' hash if this character is seen twice already.
3. If yes, don't do anything, process next character.
4. If it not seen twice yet, then we need check if it is seen first time. If seen first time, DDL node entry corresponding to that character will be null.
5. Add node in DLL and update the hash table with the node value.
6. If character is already seen once, we will have a node entry in hash table. Remove the node from queue and marked 'visited' entry as true
Let’s work out an example: Stream of characters is : a,t,r,e,r,a,p,p,p,a,u,i,b,t,………….Continues. At first all our containers are false and NULL, queue is empty.
Character ‘a’ comes : Looking at visited array,we see that it is definitely not seen more than twice, else the place would have been marked true. Also looking at node map array we can see that this character is still not in queue, it means we have seen it first time. So we add this to queue and store pointer in node map array.
Data structure condition :
Now comes ‘t’, same above conditions are true, hence ‘t’ is also added to queue. Similarly for r and e too.
When first instance of a character is seen
Twist comes when ‘r’ is seen again. This time we have a node pointer stored in node map array corresponding to ‘r’, but visited flag is false. This means we have seen ‘r’ second time. We till remove node ‘r’ from queue and mark visited[‘r’] as true, hence forth character will not considered as first non repeated character. So our data structures at this point look like this:
When second instance of character is seen
After this if query for first non repeated character, output will be ‘a’.
We process next character ‘a’. Again same fate as above ‘r’. We will remove node from the queue and marked visited[‘a’] as true.
After this if query for first non repeated character, output will be ‘t’.
This will continue is similar fashion. Please comment if further explanation is required on example.
Code to find first non repeating character
Complexity of to find first non repeating character will be O(1) to find first non repeating character at a given point of time. We would use NO_OF_CHARACTERS extra space.