Strings : Toeknize a string separated by delimiter

Continuing the previous post, let’s look at the second problem.

Problem statement

Find all tokens in a string which are separated by given delimiter.

Analysis

We have to traverse the string and for each character, check if that character is present in delimiter string. If it is, then it is end of the token started after previous encounter of the delimiter. Here forward we should scan all characters till the time we first encounter a character which is not a part of given delimiter string, that would be start of the next token.

Let’s look at it step by step.

Step 1 : Assume we have string as s and delimiter string as delim.
Search in s the first character which is not present in delim.
If there is no such byte, we are done, return NULL.
If there is such byte, then that would be start of our first token.

Step 2: Scan through the found token.
From the character found in step 1, scan through s till we again encounter a character present in delim.
If there is no such character, we are done, we have only one token and return the start position of that token.

If we have such character, that would be end of the token.

Great we have found one token, How about next one? 

For next token to be generated, we need to keep track where to start looking from in the string as we would have scanned a part of it in the previous tokens. So we keep track of the pointer where to start looking from.

Step 3 : Once we have found the end of token in step 2, again scan out all subsequent characters which are part of delim. Take the index of first non-delimiter character. This would be the starting point of our next token search.

Implementation note : To distinguish between whether it is first call to the function or subsequent call, we pass NULL pointer to the source string parameter indicating that it is subsequent call and not initial call.

Similar approach can be used to remove all characters which are present in a string from another string.

Code

Test cases

1. String and delimiter with one characters
S = “aaab_dddcc”
delim = “_”

2. String and delimiter with multiple characters
S = “aaab_dd?dcc”
delim = “_?”

3. Empty string
S = “”
delim = “_”

4. Empty delimiter
S = “aaab_dddcc”
delim = “”

5. String starting with delimiters
S = ___aaaaabbb
s = “_”

6. String ending with delimiters
S =”aaaa___”
s = “_”

7. String with no delimiter character
S = “aaaaaaaa”
s = “_”

8. String with all character as delimiter characters
S =”________”
s =”_”

Complexity analysis

Complexity of above code is O(N * M), where N is length of string and M is length of delimiter string.

Strings : Search functions

Introduction : String search functions

Many a times we need to find something or the other in a given string like find all words punctuated by given delimiter or find the location of a given sub string in bigger string etc. Under this topic we will discuss two problems 

1. Finding position of sub string in a given string.  (In this post)
2. Finding all sub strings in a given string which are separated by delimiter. (Next post)

Problem 1 : Find substring in a given string

Given a searched string S, and a substring s, check if s is present in S and if yes return the position in searched string.

Analysis

This is classic problem of traversing two string simultaneously and keeping track of characters in each.

One of the well known algorithm for this is needle and haystack algorithm. Steps of the algorithm are as follows.

1. Take begin of substring to be search.
2. Starting from begin of larger string S, do following
    2.a For each position pos in S, check if substring s can start from that position.
    2.b If the substring s can start, keep traversing both strings till either we find mismatch          or the substring is traversed completely.
    2.c. If substring is traversed completely, return the position where we started  i.e pos.
    2.d  If mismatch occurs, try matching from the next position.
    2.e  Repeat step 2.a to 2.d till we have traversed whole bigger string S.

Code

char * substr(char * haystack, char *needle){


  char * substring  = needle;
  if(needle == NULL)
      return NULL;
  
  if(*needle ==”) return haystack;
  
  if(haystack == NULL) return NULL;

  while(*haystack != ” ){
      char * pos = haystack;
 /* While substring is not terminated and characters in search string and substring are equal traverse both strings. */
     while(*needle !=” && *needle++ == *pos++);
                
 /* If we have reached end of the substring,return the pointer in search string where we started the search*/
     if(*needle == ” ) return haystack;

 /* else reset the substring pointer to start and increase the searched string by one position  */
     needle =  substring;
     haystack++;
  }
 /* If we reach end of the searched string, return NULL */
  if(*haystack == ”) return NULL;
}

There is one optimization where we can stopping matching substring once we know that the number of characters left in search string to match are less than the characters in substring.

Test cases

1. When substring is present

S = “aaabcbcd”
s = “bcbcd”

2. When substring is not present
S = “aaacbcd” 
s = “bcbcd”

3. Substring provided is empty
S = “aaacbcd” 
s =””

4. Substring pointer is NULL
S = “aaacbcd” 
s = NULL

5. Searched string pointer is NULL
S = NULL
s = “aaa”

6. Performance test, where we have a huge searched string and a small substring and it matches at the end.

Complexity analysis

Complexity of above code is O(N^2).


There are other algorithms like KMP pattern matching algorithms which do this is linear time, we would discuss that too in separate post.

String manipulation

Introduction : String manipulation

String is see is collection of character which is terminated with special character ”.

So whenever we want to allocate or reserve space for a string, we need to allocate one space extra for last NULL character.

char string[10] declares and defines a string which has 0 characters including NULL character.

Lets see some of the problems which involve traversal of string.

Problem 1 : Reverse a given string

Input : “abcd”

Output : “dcba”

It’s a very simple problem which explains the traversal of a string.
Idea is to have two pointers, one at the start of the string and other at the end. As we move for start pointer forward and end pointer backward, we swap characters at those position. As these two pointers cross each other, we stop.

void reverse(char s[]){
        int i=0, j;
        int end =  strlen(s);

     /Take care that we go only till end-1 as array is indexed from 0 */   
        for(i=0, j=end-1; i<j; i++, j–){
                char temp = s[i];
                s[i] = s[j];
                s[j] = temp;
        }

}

Problem 2 : Check if string is palindrome or not

Again just a traversal of string, one from the start and other from the end. As we move forward and backward, we need to check if characters at these positions are equal. As soon as we find one mismatch we return false. Continue till both indexes cross each other.

int palindrome(char s[]){
        int i=0, j;
        int end =  strlen(s);

        for(i=0, j=end-1; i<j; i++, j–){
                if(s[i] != s[j])
                        return 0;
        }
        return 1;

}

In palindrome problem always have two test cases :
One having string with odd length and other having string with even length.

Problem 3 : Copy on string to another

No big deal, traverse both strings together and copy ith character in source to ith index in destination.

There are couple of catches here.
First what if destination is smaller than source?
What if destination starts overlaps with source string?

Following code takes care of the first catch. Second can be taken care of by checking if dest base address is between source start and source end.
Usually we make source string as constant, so that second problem does not arise.

void copy(const char s[], char d[], int dest_size){
        int i=0, j;

        int len  = strlen(s);
        if(len > dest_size){
                printf(“String cannot be copied”);
        }
        for(i=0; s[i] != ”; i++){
                d[i] = s[i];
        }

        d[i] = ”;
}

Again take care of two things:
1. Always remember to terminate destination string with NULL character.
2. Always calculate the size of destination in calling function as if array is passed as parameter in a function, it will be treated as a pointer only and size will be always size of the pointer and not actual size of array.

In next post we would some other problems involving strings.

Fun with bits Part -3

Bitwise operation

Continuing from last two posts, today we will discuss problems which require slightly tricky bitwise operations to arrive at solutions.

Problem 1 :  Find two missing numbers in an array

We have seen solution of one missing element in this post. Can we somehow use that approach to arrive on solution?


Let’s take an example.

A = {1,4,5,6} Range is 1 to 6 and missing numbers here are 2 and 3.

Let’s follow steps we did for one missing number.


Step 1 : XOR all numbers 1 to 6
X =0001 XOR 0010 XOR 0011 XOR 0100 XOR 0101 XOR 0110 =  0111

Step 2 : XOR all numbers in set

Y = 0001 XOR 0100 XOR 0101 XOR 0110  =  0110

Step 3 XOR X and Y

Result  =  0111 XOR 0110 = 0001 = 1.

Find the LSB which is set in result. Using that information we know that one of the missing number should has that bit as set, in this case it should be bit at 0th position from right (0001).

So we can segregate numbers which have that particular bit set and those which don’t have that bit set.
Once we get that info, we can separately XOR them again with N numbers according to the condition of given bit being set.

Step 4 : Get the right most bit in result which is set.


Step 5 : Segregate number in two groups, one which have bit found in step 4 set and other which has numbers which don’t have that bit set.


In our given example C = {1,5} (with 0th bit is set) D = {4,6} (which don’t have 0th bit set)


Step 6 : XOR C and D separately.


X = 0001 XOR 0101  = 0100  Y = 0100 XOR 0110 = 0010


Step 7: Segregate all numbers ranging from 1 to n similar to step 5.


E = {1,3,5} F = {2,4,6}


Step 8 : XOR E and F separately.


W = 0001 XOR 0011 XOR 0101  = 0111 Z = 0010 XOR 0100 XOR 0110 = 0000


Step 9 XOR X and W will give us first number


Number 1  = X XOR W  = 0100 XOR 0111 = 0011 = 3


XOR Y and Z will give us second number

Number 2 = Y XOR Z = 0010 XOR 0000 = 0010 = 2

Finally we get our missing numbers.


Let’s test if it works for {1,2,4,6}


Result here will be 0001 XOR 0111 = 0110


Right most bit set is 1st bit from left.


C = {2,6} D = {1,4}

X = 0100  Y = 0101

E = {2,3,6} F ={1,4,5}

W = 0111 Z = 0000

First number =  X XOR W  = 0100 XOR 0111 = 0011 =3

Second number = Y XOR Z = 0101 XOR 0000 = 0101 = 5

Hence we can be sure that this works.

Code

Complexity analysis

Complexity of above code is O(N).


Fun with bits : Find rightmost bit set in given number

Bitwise operations

In previous post we discussed basics on bitwise operations.
In this post we would take some classic interview questions which can be easily solved if we apply some bit manipulations.

Problem 1 : Find the missing number 

Given a set of N numbers ranging from 1 to N, one number is missing, find the missing number.
Numbers are non-sorted order.

Analysis

We can easily find the element by having a hash key being numbers in set and increasing the count of the index whenever we encounter number in given set.
This approach required O(N) extra space. We want to avoid that extra space usage. 
Let’s look at XOR operation first.

 0 XOR 0 = 0
 0 XOR 1  = 1
 1 XOR 0  = 1
 1 XOR 1  = 0
Given above rules, we can derive that A XOR A = 0 because this will always make number of ones as even. For example,
9 XOR 9 = 1001 XOR 1001 = 0000.

Do we get some hint? 
We know that range of numbers is 1 to N, if XOR all numbers from 1 to N let’s call result as X. and then XOR all numbers which are actually present in set, call it Y and then if we XOR both X and Y, we get the missing number as all the numbers from 1 to N which are present in set will cancel each other.
Example,
Range is 1 to 5
Set as  {1,3,4,5} = {0001, 0011, 0100, 0101}

Step 1 : XOR all numbers 1 to 5
X =0001 XOR 0010 XOR 0011 XOR 0100 XOR 0101 =  0001

Step 2 : XOR all numbers in set
Y = 0001 XOR 0011 XOR 0100 XOR 0101 =  0011

Step 3 XOR X and Y
Result  =  0001 XOR 0011 = 0010 = 2 which is over missing element.

Code

@a as input set of numbers
@n as range of elements in input

int missing_elem(int a[], int n){
        int X =0, Y=0;
        int i, N;
/* XOR all elements in range 1 to n */
        for(i=1; i<=n; i++){
                X = X ^ i;
        }
/* XOR all elements in input set */
        for(i=0; i<n-1; i++){
                Y = Y ^ a[i];
        }
/* XOR X and Y will give us missing number.*/
        return X ^ Y;
}

Problem 2  : Find duplicate element

Given a set of numbers range from 1 to N, one number is missing and one number is duplicate. 
Find the missing and duplicate number.

Let’s work on an example
A  = {1,2,3,3,4,5} Range as 5

Step 1 : XOR all numbers 1 to 5
X =0001 XOR 0010 XOR 0011 XOR 0100 XOR 0101 =  0001

Step 2 : XOR all numbers in set

Y = 0001 XOR 0010 XOR 0011 XOR 0011 XOR 0100 XOR 0101 =  0010

Step 3 XOR X and Y as Z

Result  =  0001 XOR 0010 = 0011
Same logic mentioned above goes for this problem too.

One more problem which uses similar logic is “Find a number which occurs odd number of times in given input set”

Problem 3 : Find the position of right most set bit

Example : Let be the number be 18 = 00010010
Now if reset all bits except the LSB bit which is set and then take log (base 2), we can get the position of LSB which is set.
We can do that by 
x & !(x-1)

x-1 = 00010001
!(x-1) = 11101110

x & ! (x-1)  = 00010010 & 11101110 = 0000010 =2

Log2(2)  = 1, that is the position of LSB set.

Hence our return value becomes
log2(x &!(x-1))

In next post we would discuss couple of more problems on bitwise operations like finding two duplicate elements in a given input set etc. 

Fun with bits Part -1

Bitwise operations

Everything in computer is eventually represented as bits. A byte has eight bits, word has two bytes or 16 bits, and a long as two words or 32 bits. (True for 32 bit system).
There are only two states possible for each bit. So range of values represented by any collection of bits is 0 to 2^N -1 where N is number of bits in collection. If we take into account the sign bit, the range changes to -2^N to 2^N-1.

For example range of
Char  is 0 to 255 (2^8-1)
Integer is 0 to  4294969275 (2^32-1)

There are many operations which are bit specific and many other which can be efficient performed using bit manipulation. In this post we are going to see some problems which can be solved using bit operations.

Some basic operations on bits

1. OR
Rule : If any one bit is set, result is true or set

2. AND
Rule : If all bits are set, then only result is set.

3. NOT
Rule: Switch the bit, i.e. if unset, result is true and vice-a-versa

4. XOR
Rule If odd number of bits are set, then only result is true.

We will see Left rotation and right rotation operation on bits as we work on problems.

Problem 1 : Find sign of a number

How do we identify sign of an integer, yes, it’s the first bit in the number. So solution of this problem is to just check the first bit of the given number and we can easily do that by ANDing Most significant bit with 1. Catch here is we need to we don’t need to use mask.

Hence solution is

sign = (value >> (sizeof(int) * CHAR_BIT – 1));

If number is negative, then it MSB of the number will be copied on all bits as we shift right. All 1s will give us -1. If the number is positive, MSB is 0 and hence the overall number after shifting will be zero.
Hence this implementation return 0 if number is  positive and -1 if number is negative.
This is architecture specific code.

Problem 2 : Check if number is power of 2

Let’s check some of numbers which are power of 2
2  = 0010
4 =  0100
8 =  1000
16 = 0001000

So is there any pattern? Yes there is. If a number is power of two, it has only one bit set.
How do we find that only one bit is set?

Let check numbers which are one less than power of 2
2-1 =1 = 0001
4-1 =3 = 0011
8-1 =7 = 0111
16-1 =15 =01111

Do we see something here? yes that is all the bits after the bit which is set in number which is power of 2 are set.
So, v & v-1 will give us 0 is number is power of two.
2 & 1 =  0010 & 0001 = 0000
4 & 3 =  0100 & 0011 = 0000
8 & 7 = 1000 & 0111 = 0000
9 & 8 = 1001 & 1000 = 1000

Hence we have to just return value & value-1.
If returned value is zero, then number is power of 2. If non-zero, then number is not power of zero.

However if input is zero, then we incorrectly return it as power of 2. Hence the final code should be

value && ! (value & value-1).

Let’s see what happens when value is zero.

Expression becomes 0 && ! (0 & -1) = 0 && 1 =0. In this implementation, we return non-zero if number is power of 2 else 0.

Problem 3 : Find number of bits set.

First solution is to scan through all bits of number and count which are set. This requires N operations where N is number of bits in number.

We can do better than that.

10 = 1010 9 = 1001 = 10 & 9 = 1000
11 = 1011 10 = 1010 = 1010
12 = 1100 11 = 1011 = 1000
14 = 1110 13 = 1101 = 1100

We can see from above table that, when we AND a number with number -1, it clears the least significant set bit.

So every time we and N and N-1, we clear LSB, one bit at a time. We can continue to do so till our number turns to zero.

for(i=0; v!=0; i++)
     v & = v-1;

This code will require only M operations where M is number of bits set in number.

These are three basic problems involving bit wise operation. We would discuss couple of problems which use XOR operations on finding duplicate element or missing element in a given input array.

Linked list : Detecting loop in singly linked list

Basics about singly linked list can be read here.

Detecting loop in singly linked list

Given a singly linked list, which may or may not contain a loop in it, we need to find out whether there is a loop and it yes, start of the loop should be returned.


For example, in following linked list, function should return 4.

Singly linked list with loop

Analysis

In singly linked list, we can traverse only in one direction and end of the list is detected when NULL pointer is encountered. What if linked list has a loop in it? Then we would never reach end of the list and circle around in the loop.
So let’s use this property. We move two pointers at different speed to traverse list, we would surely meet at a node which in the loop.
Take two pointers, first which moves one node at a time, call it slow; other which moves two nodes at a time, call it fast.
When slow meets fast, that node will be in the loop.
Traversal of two pointers in above list would be
Slow : 2 , Fast : 3 
Slow : 3 , Fast : 5 
Slow : 4 , Fast : 3 
Slow : 5 , Fast : 5 

First problem is solved. If before reaching NULL, if slow meets fast, then there is a loop in list.

Second problem requires some thinking.
What would be the condition when slow meets fast? Can we have the length of the loop? Yes. 
Move the slow pointer till it again meets faster pointer and keep count of number of nodes. Let number of nodes in loop be K

Now the problem reduces to finding Kth node from the end because if there are K nodes in loop, start node would be Kth node from the end of the list. Move one pointer K nodes ahead of head and keep other at head. Move both of them simultaneously. When they meet, that node is starting point of the loop.

Algorithm to detect cycle in linked list

Cycle detection
  1. Take two pointers, slow and fast.
  2. Move slow one step at a time, fast two steps at a time.
  3. If the meet before slow reach NULL, then there is cycle in list.
  4. Else return false.
Starting node detection
  1. Find the length of the loop in the list, by moving slow again to meet fast and keep track of count.
  2. Once we find no of nodes in loop, K,  find Kth node from the end. (Can be read here)
Termination condition will not be NULL but two pointers one starting from head and other K nodes ahead of head being equal.

Code

Test cases

1. Normal list with loop
L1  = 1->2->3->4->5->6->4 (6 pointing back to 4)

2. Normal list with no loop
L1  = 1->2->3->4->5->6->NULL

3. Empty List
L1 = NULL

4 List with complete loop
L1  = 1->2->3->4->5->6->1 (6 pointing back to 1)

5. List with duplicate elements
L1  = 1->1->2->3->4->4->5->6->4 (6 pointing back to 4)               

Complexity of above algorithm to find cycle in a linked list will be O(N).