Trie data structure implementation c

Tries as data structure

Trie data structure are usually neglected by programmer while designing solutions. Tries are very useful in cases where strings are involve with good amount of duplication of their prefixes.

What are trie data structure?

Trie is a data structure which stores information, referred to as key, in such a manner that the common information is stored only once. Usually keys are strings. No node in trie stores the whole key, but the position of the node gives us information that which key it is part of. All the descendant node in trie have the same prefix, hence trie is also know as prefix trees.Word trie is taken from retrieval word.

Other property of trie is that every node of trie has multiple children. In case of string (assuming lower case), there will be 26 children of each node.
Take an example: We have following strings: “House”, “Home”, “Tree”
I need to store this information, Trie for above words would look like:
Trie data structure example

Insertion in trie data structure

Before insertion we need to understand that how can we represent a node of a trie.Since intermediate nodes of a key do not store any information at them, we need to distinguish between the leaf node and the intermediate node. For that we can store a field called as value inside each node, If the value is zero, it’s an intermediate node, if non-zero, it’s leaf node.
In our case we are assuming MAX_SIZE of 26.

Basic initialization of trie data structure

#define MAX_SIZE 26

#define GET_CHAR_INDEX(ch)\
        ((int) ch - (int)'a' )

#define LEAF_NODE  1
#define true 1
#define false 0

typedef struct trie_node_t {
        int value;
        struct trie_node_t * children[MAX_SIZE];
}Node;

typedef struct trie{
        int count ;
        Node *root;
}trie;

void initialize(trie *t){
        t->root = (Node *)malloc(sizeof(Node));
        t->count =0;
}

Now we can go and see insertion part.

  1. Scan through the string character one by one.
  2. For each character, check if it has child nodes. Character acts as index in children array.
  3. If node at char index  has child node, move to child node and take next character of string. Repeat step 2 and 3
  4. If node at char index doesn’t have child node, create a child node and add it to trie. Move the character in string and move down to newly created child. Go to step 2.
void insert(trie *t, char key[]){

        int len = strlen(key);
        int i;

        if(t->root == NULL){
                printf("\n Trie is not initialized\n");
        }

        Node * current = t->root;
        for(i=0; i<len; i++){

             /* Get the index of char in children array */
             int index  =  GET_CHAR_INDEX(key[i]);

             /* If the children array does not contain the pointer
                at index pointed by char, allocate new node */

             if(current->children[index] == NULL){
                current->children[index] = (Node *)malloc(sizeof(Node));
             }
             /* Else traverse down the trie. */
             current = current->children[GET_CHAR_INDEX(key[i])];
        }

        /* To mark it as leaf */
        current->value = LEAF_NODE;
}

Search in trie data structure

Best part of trie is its search. It is done in O(M) where M is length of input string.

  1. Start from the root. and take the first character of string.
  2. If the array index pointed by the character is NULL, return false.
  3. Else move to the child node and next character.
  4. Repeat step 2 and 3 till the end to input string.
  5. If we reach till the end of string and the current node is leaf node (identified using value), then return true.
int search(trie *t, char key[]){
    int len = strlen(key);

    if(!(t->root))
        printf("\n Trie is not initialized\n");

    Node * current = t->root;

    for(int i=0; i<len; i++){
        int index  =  GET_CHAR_INDEX(key[i]);

        /* If characters are left in key and 
           we have reached at NULL, there is 
           no key present */
        if(!(current->children[index])) return false;
                 
         /* Else traverse down the trie */
         current = current->children[index];
    }

    /* If we have reached the leaf node, 
       key is present */
    if(current && current->value == LEAF_NODE )
        return true;
    
    return false;
}

Only drawback with tries is that it takes a lot more space (K *  MAX_SIZE * N) as compared to binary search tree. It is evident from the fact that every node in trie has to have place for all the possible characters.
In time complexity terms, insertion and search both are O(M) where M is length of string being entered.
In this post we saw  insertion and search operation in trie data structure. We will use these operations in future post and solve some real problem like Find words in matrix

Please share if there is something wrong or missing. If you are interested in contributing to algorithms and me, please contact us.