Print all anagrams together C implementation

Print all anagrams together

Many a times, we need to sort some elements which are not integers, like strings. How do we go about using quicksort. I did not know that C library has a function for it. This function is very powerful and can easily sort anything given compare function. So in today’s post, we will use quicksort function and learn how to solve our problem. Problem statement is : given a sequence of words, print all anagrams together. Instead of printing, problem can be asked as group all anagrams together. For example, if input is like :cat, tac, god ,gdo, ball, act, dog, output should be

cat
tac
act
dog
god
gdo

Print all anagrams together : Algorithm

First of all, understand how to identify strings as anagrams. There are many efficient ways to find if two string are anagrams or not, but a simple one is to sort two strings based on characters and if two resultant strings match, two strings are anagrams.

Now problem reduces to sorting each individual string and then sort entire array of strings. Anagrams will automatically be grouped together.

Look closely, there is another problem with solution mentioned, that is when we sort original strings individually, original strings are lost. So, idea is to have a duplicate array of strings. First, copy all string to new array and then sort string individually.  Still there is one problem : after sorting new array, we will loose index of string before sorting. Hence, we need to store index of each string before sorting new array.

We now know, what to store, let’s discuss how to store those information? Original strings are stored in an array of character pointers. Also, we need a duplicate array to sort strings. To store positions of strings in original array.  Hence, let’s create a structure which stores both information, string and position.


typedef struct duplicate{
    char *word;
    int pos;
} Duplicate;

First step is to sort individual strings in duplicate array, using library function qsort().
This function is very easy to use, there are four parameters to pass:
1. The buffer or array to sort.
2. The size of buffer or array
3. Size of individual element of buffer.
4. And last and most important is compare function, to be written in code and passed as function pointer to this function.

For further details of qsort() function usage, refer :qsort in c

Once all strings are individually sorted, we sort entire array of strings using same qsort() function with different compare function, which now runs on entire string instead of each character.

We are almost done! All anagrams are placed next to each other, however, we need to print original string together. However, all anagrams will be same string in sorted array. That’s where index stored in duplicate array will help. Print words from original array using the positions given in duplicate array.

Print all anagrams together : Implementation

#include<stdio.h>
#define MAX_WORDS 101

typedef struct duplicate{
    char *word;
    int pos;
}duplicate;
/* Compare function for string comparison */
int compare(const void * a, const void * b){
    return *(char *)a - *(char *)b;
}
/* Compare function for string array sorting */
int compare_str(const void *a, const void *b){
    duplicate *word_1  = (duplicate *) a;
    duplicate *word_2  = (duplicate *) b;
    return strcmp(word_1->word, word_2->word);
}
/* This function copies array in duplicate array */
void copy_in_duplicate(char * array[], 
                       duplicate  dup_array[], 
                       int count){
    int i ;
    for(i=0; i<=count; i++){
        dup_array[i].word = (char *) malloc(sizeof(char) 
                           *strlen(array[i]) + 1));
        strcpy(dup_array[i].word, array[i]);
        dup_array[i].pos = i;
    }
}
int main(){
    unsigned int count = -1;
    unsigned int i;
    char *array[MAX_WORDS];
    duplicate dup_array[MAX_WORDS];
    char word[100];
    /*Reading from the standard input */
    while(fgets(word, sizeof(word), stdin) != NULL){
        count++;
        array[count] = (char*)malloc(sizeof(char) * 
                             (strlen(word) +1));
        if(sscanf(word, "%s", array[count])  == 0){
            printf("\n Error inreading strings");
        }
    }
    
    copy_in_duplicate(array, dup_array, count);
    /*Sort each string individually */
    for(i=0;i<=count; i++){
        qsort(dup_array[i].word, strlen(dup_array[i].word), 1, compare);
    }
    /* sort all strings in array */
    qsort(dup_array, count+1, sizeof(dup_array[0]), compare_str);
    /* printing all words with anagrams together */
    for(i=0;i<=count; i++){
        printf("\n %s", array[dup_array[i].pos]);
    }
	return 0;
}

Implementation will perform sorting on M strings, each of size N characters, so complexity of algorithm to print all anagrams together will be O(MlogM + MNlogN) MNlogN because NlogN for sorting each string of size N and there are M strings,MlogM to sort M strings in array.