Find Euler path in a graph

Find Euler circuit and path in a graph using Fleury’s algorithm.

In last post, Graphs: Find bridges in connected graphswe discussed how we can find bridges in an undirected graph. In this post, we will be discussing an algorithm which uses bridges to find Euler’s path in a graph, algorithm is called as Fleury’s algorithm.

As per the definition, Euler’s path or circuit in a graph is a path which traverses each edge of the graph once between source and destination. If source and destination are same, then it becomes Euler’s circuit. To understand how we can find that there is an Euler path existing in a graph, please follow this post: Graphs : Euler circuit and Euler path in graphs

Fluery’s algorithm to find Euler path or circuit

  1. Start from the source node, call it as current node u. If there are nodes with odd degree (there can be max two such nodes), start any one of them. Else start from any node in graph.
  2. Traverse any edge (u, v) from current node which is not a bridge edge.
  3. Set current as v and go to step 2
  4. When all edges which are not bridge edges are traversed, start traversing bridges edges.
  5. Stop when all nodes are traversed.

Walk through Euler path step by step

1. Graph is as follows, there are two nodes 1 and 6 which have odd degrees, so our candidates for start will be either one of these. Start from 6.

Initial graph with node 1 and 6 with odd degree

2. Traverse to 5 and then to 1 and then to 3 followed by 2 one by one as these are not bridge edges.

3. At 2, there are three edges (2,1), (2,4) and (2,4). However, if we follow edge (2,1) then we will burn the bridge and never be able to come back to rest of the graph, hence we will not be going to that edge. So, we will be moving edge (2,4).

4. At 4 again, we have two edges (4,2), (4,6) and (4,3), But if we go through (4,2) we will again burn bridge as (4,2) is bridge edge. Hence, we will traverse (4,3).

5. At 3, we are left with only edge and we will follow it, which will be followed by edge (6,4).
At 4, bridge edge now has to be followed at 4 because there is no option and same for edge (2,1).

Hence, Euler path will be 6-5-1-3-2-4-3-6-4-2-1

Code for Fluery’s algorithm
Please refer Graphs : Euler circuit and Euler path in graphs for detail graph creation and finding if there is an Euler path existing in graph.
Complexity of above algorithm is O ((V + E)* (V+ E)), as for each vertex V, we need to check each edge whether it is bridge or not which take O (V+E) time. Hence, the overall complexity is as given above.

Find bridges in connected graphs

Detect bridge edges in graph

Given a graph, detect all bridges in it.  An edge is called as bridge edge if and only if on removal of that node, graph becomes disconnected if it was connected graph and if it was disconnected then number of components increase by one.  For example in below graphs, bridges are shown in red.

Bridges in graphs
Concept of detecting bridges in graph will be useful in solving Euler path or tour problem.

Analysis

Depth First Search of graph can be used to see if graph is connected or not. We can use same concept, one by one remove each edge and see if graph is still connected using DFS. If yes, then edge is not bridge edge, if not, then edge is bridge edge. However, this method entails quite a complexity of O(E * (V+E)) where E is number of edges and V is number of vertices.

Let’s think something better. Consider that we are looking at edge (u,v) in graph. In what condition, we can say that it is bridge edge?
If we can somehow reach node u or any ancestor of u from any node which is decedent of v, that means graph is still connected and (u,v) is not a bridge edge. If above condition is not possible, then (u,v) is bridge.

How can we determine that there is no edge from decedent of v to u or its ancestor? For that we need to maintain time when a node was discovered during depth first search, call is d[].
d[u] is time when node u was discovered using DFS. If d[u] < d[v], means u was discovered before v.
Below is graph with d[] filled for each node.

DFS traversal order of graph

Now, figure out the lowest d[x] which can be reached from each node. Reason to find that is to see if there is a node x which is reachable from children of v and has d[x] less than u,i.e. x is ancestor of u reachable from children of v. Store lowest DFS ancestor reachable from node u in an array low[u].

So low[u] = min(low[u], low[v])  for edge (u,v)
Idea here is that if (u,v) is an edge, then either there is no back edge from sub tree of v to u or ancestor of u, in that case low[u] will be lowest. If there is a back edge to x from sub tree of v, then minimum d[x] reached by node in sub tree will be low[u].
Below diagram shows calculation of low[] in graph.

Calculation of low[]

Finally, if low[v] > disc [u] that means if discovery time of u is less than least ancestor that can be reached from sub tree of v, we have a bridge, because there is no way we can reach to any ancestor of u once we disconnect edge (u,v).

In above diagram, if we remove edge (3,4), still graph is connected as low[4] = 2 which is less than d[3] which is 3. Hence edge (3,4) is not a bridge.

Lots of theory,lets code it. We will be modifying Depth First Search implementation to keep track of d[] and low[].

Code for finding bridges in graph

Complexity Analysis

Complexity of finding bridges in graph code is O(V + E) where V is number of vertices and E is number of edges in graph.

Graphs : Euler circuit and Euler path in graphs

To understand basics of graph, please follow  link:  Graphs : Basics and representation
Today we will be discussing a typical problem on graph that is to find if there exists an Euler path or Euler circuit in graph and if yes, trace that path or circuit.

Problem statement

Find if a connected graph has Euler path or circuit.
Euler path is trail in graph which visits each edge only once from source to destination. For example in following graph, Euler path will be :

Euler Path of a graph

Similarly, Euler circuit is Euler path only where source and destination nodes is same node.

Analysis

There are two types of graphs : Directed and undirected graphs, for each of them condition for having Euler path and circuit vary a bit. So consider undirected graph first.
For undirected graphs:
For an undirected graph to have Euler path, each and every node of the graph should have even degree except from start and end node of path which may odd degree. To understand what is degree,please follow the link at the top which explains basics about graphs,

To have Euler circuit all nodes should have even degree with no exception.
For directed graphs:
For a directed graph to have Euler path, on each node, number of incoming edges should be equal to number of outgoing nodes except start node where out degree is one more than in degree and end node where incoming is one more than outgoing.
To have Euler circuit, all nodes should have in degree equal to out degree.
We have to keep in mind that for both directed and undirected graphs, above conditions hold when all nodes with non zero degree are part of strongly connected component of graph. To read more on strongly connected component, please follow this link : Graphs : Connected components

Now, since we know condition to have Euler path and circuits, we write code for it. All we need to do is to count in degree and out degree of each node and check for conditions above. If they meet, we are all good to say yes else no there does not exist any Euler path. 

Code

Complexity of above code is O(V+E)

Once we have figured out that there is a Euler path, we need to find that path. There are two algorithms for it.

  1. Hierholzer’s algorithm
  2. Fleury’s algorithm
In next posts we will discuss about these two algorithms and their implementations.

Find N most frequently occurring words in a file

Find N top occurring words in a file

Given a dictionary like text file, find n top occurring words in it i.e. n words whose count is the maximum. Or in other words, find most frequently occurring words.

For example, if given file is like follows:
one, two, two, three, three, three, four,four, four,four, five, five, five, five, five, six,six, six, six, six, six  and we are looking for three most frequent words, output should be :  six,five, four.

Let’s solve a problem which involves two concepts : One of hash maps and other heap.

Analysis

First step to solve this problem is to read the file. We can create an file object and open the file and read the file using fgets or gets. In this code, we are using C++ to read file. C code to read a file can be found here.

Second, we need to count frequency of each word. So when a word is read, we need to increment count corresponding to that word. Which data structure is best suited for this kind of operation? Hash of course. Hash gives us constant time complexity to lookup a string and increment its count.

Now, we have list of words and corresponding frequency of each word, we need to figure the top n most frequently appearing words. 
So, thump of rule says that whenever we have to figure our top or least from collection of items, heaps are best bet. To find top N, we will go with min heap.
This is how it works:
  • Take first N words appearing in map along with their count and create a min heap with these N elements.
  • Now, one by one read each word from hash and check if the frequency of the words is more than the minimum in heap, that is top of heap.
  • If yes, remove the top and add this word into the heap. If not, continue with next word.
  • When done with all words in hash, min heap will contain N most frequently occurring words in file.
To find K minimum elements using min heap is also explained here : Heaps : Find K smallest element from a given array

Implementation notes:

Problem has been solve using STL in C++, using map, vector and prioirty_queue libraries.
Also, hash map is created to map string with an object which contains string and its frequency to be used for min heap purpose. 
Also, default priority queue in implementation in STL gives max heap, to use it as min heap, new comparator is written.

Code to find top n occurring words

Reading M words from file will require O(M) time, where as creating N element heap will take O(N).
Also scanning through all elements will take O((M-N) Log N) complexity.
Hence overall complexity to find top N occurring words in a file will  be O(M Log N).

Same problem can be asked in other way:
Given  a file with million points (x,y) where x and y are x and y co-ordinates of point. find 100 closest points to origin.
Just update the hash to store the point and it’s distance. Then create a max heap with first 100 points and then read one by one all points and update the max heap accordingly. At the end max heap will contain 100 most closest points to origin.

Operating systems : Thread models using user and kernel space threads

Threads are very important concept in operating systems. We covered the basics and difference between a process and thread in Process Vs Threads post.
Let’s understand it in more detail.

Thread models

We briefly touched upon various thread models which are usually used for implementation of threads. These models are :
1. 1: 1 model
2. M:1 model
3. M:N model
So before understanding what these models are, we should understand what is difference between user space threads and kernel space threads.
User space threads are those threads which are completely implemented in user space using some library which is implemented above kernel. These threads are easy to manage and schedule as whole logic is in application itself. Also, for obvious reasons, application using user level threads is portable to any kernel. No kernel mode privileges are required for switching between threads.
Problem with these threads is that kernel is totally unaware of multithreading at application and any blocking system call will block the whole process.So, these kind of threads are useful in application when thread are in runnable for most of their lifetime. Also, thread needs to give up control voluntarily, else no other thread in process will be able to run.
Use space threads
When thread is implemented in user space, application has to maintain its thread table to keep track of thread status.

Kernel space threads are threads which are implemented in kernel and there is no logic required at application level. Thread creating, management and scheduling is done in kernel space. Advantage is that kernel is aware of threading model and scheduling can be done accordingly.
Drawback of this mode is kernel has to keep track of threads, synchronizations and scheduling, which will be overhead.
To get best of both worlds, we have something called as hybrid models and that’s where above mentioned threading models come into picture, Let’s see them one by one.
1 : 1 model
In this model, for each thread in user space, there is corresponding thread in kernel space, so in true sense kernel can leverage the multiprocessor by scheduling different threads on different processors.
Window server usually follow this model.
1:1 Model
M:1 model
In this model, for M user space threads there is only on thread in kernel space, It is as good as implementing user space thread because once there is system call to kernel, process will be blocked as only once thread is there in kernel. This can be overcome by replacing blocking system calls to kernel with non blocking equivalents, however many systems cannot be implemented as non blocking version.
M:1 Model
M:N model
This model in true sense leverage the threading model, where for M threads in user space, kernel has N threads, there is no predefined one to one mapping between user space threads and kernel space thread and it is done dynamically.
M:N Thread model
Problem is that scheduling needs to be implemented at both user space as well as kernel space.

Find median of two sorted arrays of different sizes

Median of two arrays of different sizes

Given two sorted arrays of unequal sizes, find median of union of them. Same problem has been solved for equal size arrays here  : Find median of two sorted arrays

Analysis

As per the definition of median, median provides the middle of the range of the elements, that is, all the elements on left side are smaller than median and all the elements on right side are larger. 

Brute force algorithm will be to merge two sorted arrays and take middle or average of middle and middle + 1 element and return. That requires O(m+n) time as nay merge will at least take m+n.

Can we optimize the algorithm or modify? Do we need to completely sort elements in order to get middle of resultant array? Answer is No. Please follow this post to understand how we find middle of two sorted arrays of unequal length in O(log(m+n)) time :  Find Kth smallest element in two sorted arrays

Idea here is to find middle and middle+1 element using above algorithm and tada! we are done. Just return the average of two.

Algorithm to find median of union of two sorted arrays

1. Find the middle of two sorted array, if they are combined together.
2. Find middle +1 element in same arrays.
3. Get the average of two and return as median.

Only doubt will be that, whenever we are discarding parts of arrays, they might not be equal, so the resultant median of these sub arrays may not be equal to median of original arrays.
So, we need to find the minimum of k/2 or the current size left when function is invoked, that’s why you will see i and j are found using this information.

Let’s code!

Code

Complexity analysis

Complexity of code to find median of two sorted array with above algorithm is O(log (m+n)).

Please comment if you find something wrong or you have another way of implementing it. Thanks!

Find Kth smallest element in two sorted arrays

Find Kth smallest element in two sorted arrays

Earlier on this blog, we discussed a problem where, we found K smallest elements in a array. There are different ways to do, like sorting the array and take first K elements, or optimized quick sort, or using min heap. Detailed explanation for all methods is discussed here at Find K smallest elements in array.

Today, our problem is to find Kth smallest element in two sorted arrays. In another words, given two sorted arrays, find Kth smallest element of union of these two arrays.

For example if A = [10, 20, 40, 60] and B =[15, 35, 50, 70, 100] and K = 4 then solution should be 35 because union of above arrays will be C = [10,15,20,35,40,50,60,70,100] and fourth smallest element is 35.

Kth smallest element in two sorted arrays : analysis

One way of solving problem is merging both arrays into one sorted array and then return the Kth smallest element of merged array. This will require O(M+N) additional space. Also, complexity of sorting will be O((M+N)log(M+N)) where M and N are sizes of two sorted arrays. Can we do better than this? Do we need to completely merge two arrays or based on some information at hand at point any point of time, we can discard some portion of arrays?

Discard. This word rings bells, divide and  conquer 🙂 How can we apply that here is the next question.  Find K/2th index from first array, call it i and K/2th index from second, call it j.  Now consider this:

1.If A[i] > B[j],then Kth smallest element will include elements more than K/2 elements from array B and less than K/2 elements from array A. So our area of searching reduces to left part of K/2 index in array A and right part of k/2 index in array B, as shown in figure below.
Again, since we have already found j definite smallest elements, we will reduce the search for (k-j)th smallest element in reduced search space.

2. Similarly, if A[i] < B[j], our search space reduces to right part of K/2 index in array A and left of K/2 elements of array B, as shown in figure below. Again since we have already found i definite smallest elements, we will reduce the search for (k-i)th smallest element in reduced search space.

So what will be the base case: When K is one, then we return the minimum of A[i] and B[j] (considering the zero based indexing of array) or if one of the array becomes null then return the other array's Kth smallest element.
OK, we are all set to code now. It's really simple.

Kth-smallest-element-in-sorted-arrays

Code to find Kth smallest element in two sorted arrays

#include <stdio.h>

#define min(a,b) a > b ? b:a

int findKthSmallestElement(int a[], int b[],
       int sizeA, int sizeB, int k){
  /* to maintain uniformaty, we will assume 
  that size_a is smaller than size_b
  else we will swap array in call */
  if(sizeA > sizeB)
    return findKthSmallestElement(b, a, sizeB, sizeA, k);

  /* Now case when size of smaller array is 0 
    i.e there is no elemt in one array*/
  if(sizeA == 0 && sizeB >0)
    return b[k-1]; // due to zero based index
  
  /* case where K ==1 that means we have hit limit */
  if(k ==1)
    return min(a[0], b[0]);

  /* Now the divide and conquer part */
  // K should be less than the size of array  
  int i =  min(sizeA, k/2) ;
  // K should be less than the size of array   
  int j =  min(sizeB, k/2) ; 

  if(a[i-1] > b[j-1])
    // Now we need to find only K-j th element
    return findKthSmallestElement(a, b+j, i, (sizeB-j), k-j);
  else
    return findKthSmallestElement(a+i, b, (sizeA-i), j, k-i);

  return -1;
}
int main(){
  int a[] = {10,30,40,50,60};
  int b[] = {30,50,100,110, 200};

  int sizeA  = sizeof(a)/sizeof(a[0]);
  int sizeB  = sizeof(b)/sizeof(b[0]);

  printf("\n Kth smallest element is : %d",
         findKthSmallestElement(a,b,sizeA, sizeB, 4));
  return 0;
}

Complexity of algorithm to find Kth smallest element in union of two arrays is O(log(N+M)).

Please share if there is something wrong or missing. we would be glad to hear from you.