Given a sequence of words, print all anagrams together

Print all anagrams together

Many a times we need to sort some things which are not integers, foremost being strings. How do we go about using quick sort. I did not know that C library has a function for it. This function is very powerful and easily sort anything given the compare function. So in today’s post we will be using that function and learn how it can be used to solve our problem statement.
Given a list of words, group all words together which are anagrams. For example, if input is like :

cat
tac
act
dog
god
gdo

Analysis

First of all, we want to understand how to strings can be identified as anagrams. There are more efficient ways to do it, but the most simple one is sort two string based on characters and if two resultant strings match, two strings are anagrams.

Now the problem boils down to sorting each individual string and then sort the entire array of strings. The anagram strings will automatically will grouped together.
If we look closely, we have another problem, that is when we sort the original strings individually, original strings are lost. So, idea is to have a duplicate array of strings and then sorting operation. Also to after sorting the duplicate array, we will loose the original index of string once we sort the entire array. Hence we need to store that information too.

Let’s figure what we need to store and how?
We have array of strings, let’s say there can be maximum 100 such strings.
So we create and array of character pointers to store these strings.
Also, we need a duplicate array where we will sort strings and also we need to store the position of the string in original array so, duplicate array would look like:
Next step is to sort individual string in duplicate array, where we will use the library function qsort().
This function is very easy to use, there are four parameters you have to pass:
1. The buffer or the array you want to have sorting performed on.
2. The size of buffer
3. Size of individual element of buffer.
4. And last and most important is the compare function, which has to be written in code and passed as function pointer to this function.
For further details of qsort() function usage follow : qsort

Once all the strings are individually sorted, sort the entire array of strings using the same qsort(), only difference will be the compare function which will now run on entire string instead of character.

We are almost done! Now we have all anagrams placed next to each other, however, we need to print original words. That’s where the index we stored in duplicate array will help and we will print words from original array using the order given in duplicate array.

Code for printing anagrams together

Complexity Analysis

This code will perform sorting on M strings, each of size N characters, so complexity of algorithm to print all anagrams together will be O(MlogM + MNlogN) MNlogN because NlogN for sorting each string of size N and there are M strings,MlogM to sort M strings in array.

Reading input from standard input

Reading input from standard input

Now a days,big companies usually have their first round of interview online where they outsource test to either HakerRank or some other such website. One thing which is of at most importance in those tests is that we need to pass test cases which are run when you submit the code. These test cases are taken from file or standard input and need to be read from standard input. For example, question based on array and array elements will be given on one line on STDIN, without mentioning the size of the array first. So input will be like  34 45 56 78 89 100. We need to read this input, process it and then produce the output.
Many a times, we flounder when we need to take inputs from stdin as we are not used to such things in our day to day coding. In this post I am covering methods in which you can read inputs for STDIN.

Solution

Whenever we are reading from STDIN, use fgets() to read the entire line from it. fgets() takes three arguments, first the buffer where it will read the input to, second size of input to be read in bytes and third the source of input which in our case will be STDIN. For details refer : fgets()
Now, once we have read the entire line of integers from STDIN in buffer, we need to segregate each integer from it. There comes the sscanf ()function.
sscanf() takes three arguments, first the buffer which is read by the fgets(), second the format string of input like “%d” for integers, and last the place where it need to put the scanned input. For more details on sscanf please refer : sscanf()
Now, let’s put this all together and see if it works. In following code, I am scanning a list of numbers from STDIN and storing them in an array and then printing that array. If you have notice, in code we have something called as n which is also read with sscanf, it helps us in finding the pointer in the buffer from which we need to read next integer from, else we will always be reading the first integer only.
Now look at when there are multiple lines are given as input from the STDIN like below are the edges of a particular graph, from which we need to store graph representation:
2 3
4 5
6 8
Reading of this kind of input is very similar to what we saw above. Only thing is now we need to put fgets() in while loop till we have input to read from line.
Notice one thing that now, we are checking return value of sscanf() against two because now we are explicitly reading two integers from the line, whereas in previous case we were reading one integer at a time.
This is how one can read from standard input and please let me know if you could not follow something or want to add to it.