Strongly connected components – Tarjan Algorithm

Strongly connected components in graph – Tarjan Algorithm

What are strongly connected components in a graph? A directed graph is said to be strongly connected if we can reach any vertex from every other vertex in graph. Strongly connected components of directed graphs is subset of the vertices in the given graph such that any two vertices of this subset are reachable from each other ,i.e
 Forall u, v in C
 u mapsto v, v mapsto u  where u,v are the vertices of the graph and   Mapsto denotes reachability ,i.e existence of path from first vertex to the second vertex.You can see it from the diagram below:
Strongly connected component
You can see different strongly connected components in the given graph shown by different colors.

Strongly connected components : Algorithm

There are two algorithm to find strongly connected components one is Kosaraju’s algorithm and another one is Tarjan algorithm . Tarjan algorithm requires only one depth first search traversal to find out the all strongly connected present in the graph.So it is efficient as compared to Kosaraju’s algorithm as it requires two Depth First traversal (DFS).
For studying the algorithm we should know DFS traversal creates a DFS tree. By DFS tree I mean, a node is not encountered more than once while traversing graph depth first. For example if take above graph and do a DFS on it. node being traversed will be in this order:
Tarjan
We start at ‘a’, there is only one way to move forward and that is to go to ‘b’. At this point, our DFS tree has two node as shown below.

Now at ‘b’ there are three choices. Either we can go to ‘c’ or ‘f’ or’e’. This will result in three children of node ‘b’ in DFS tree as shown below. Again, any one of the three nodes will be selected. Let’s say it’s ‘e’. Is there any node we can go from node ‘e’? No, hence we will move back to it’s parent ‘b’ and explore other nodes. DFS tree at this point will be
strongly connected components example

‘f’ will be selected next and then ‘g’ is visited and then ‘h’. At h there is no node which can be visited. Move back to ‘g’, no further node to be traversed. DFS tree is like this
strongly connected components implementation

Now we traverse node ‘c’ and its descended the same fashion. Final DFS tree is shown below.
Tarjan
How DFS tree is related to Strongly connected Components(SCC)?
A sub tree with a node as root, is a strongly connected components if there is no back link from any descendant of the node to any ascendant of the node. A bit confusing. In simple term there should not be a link between a node which is traversed after the root node which allows to traverse a node which is traversed earlier than root node. Below figure explains concept of back links, descendant and ascendant clearly.  
strongly connected components in graph

If  ‘x’ is the root of the sub tree and ‘y’ is descendant of ‘x’ and there is an edge from ‘y’ to ‘z'(ancestor of  x) then we can say that edge z-y together with the paths in the DFS tree from z to x and from x to z form a cycle, which must all be part of the same component. But y is higher or equal in the tree than x, so x can not be the head of this component thereby contradicting our above statement.
Explanation of Algorithm?

Vertex are placed in stack S in order in which they are visited so when  DFS recursively explores a vertex v and it’s descendants ,those nodes are not necessarily popped from stack S before this recursive call returns.Thing which we will consider in building the stack is that a vertex will remain in stack after exploration if and only if it has a path to some vertex earlier in stack. At the end of recursive call that explores v and it’s descendant ,we know whether v itself has a path or not to any vertex earlier on the stack.If so,call returns ,leaving v on the stack S.If not then v must be the root of its strongly connected component at this stage,stack S consist of v together with it’s descendants(these nodes have paths back to v but not to any earlier node,because if they have path to earlier node then v would also have paths to earlier nodes which is false thereby v will not be the root of this subtree) .Now all elements which are popped from the stack at this stage which are strongly connected components.

For doing this ,we define two variables for each vertex, one is low_index and other is index.

For example,if we are exploring descendants of ‘u’ then value of low_index of the child ‘v’ will be changed in two ways:
1)If node ‘v’ is visited for the first time:
low_index(v) = min(low_index(u),low_index(v))

2)If node ‘v’ is already visited(back edge):
low_index(v) = min(low_index(u),index(v))

In second case we are considering index(v) instead of low_index(v) because v is?

How can we determine that there is no edge from v or decedent of v to u or its ancestor? For that we need to maintain time when a node was discovered during depth first search, call is d[].  d[u] is time when node u was discovered using DFS. If d[u] < d[v], means u was discovered before v. Below is graph with d[] filled for each node.
Tarjan

Now, figure out the lowest d[x] which can be reached from each node. Reason to find that is to see if there is a node x which is reachable from children of v and has d[x] less than u,i.e. x is ancestor of u reachable from children of v. Store lowest DFS ancestor reachable from node u in an array low[u]. So low[u] = min(low[u], low[v])  for edge (u,v)

Idea here is that if (u,v) is an edge, then either there is no back edge from sub tree of v to u or ancestor of u, in that case low[u] will be lowest. If there is a back edge to x from sub tree of v, then minimum d[x] reached by node in sub tree will be low[u].

Below diagram shows calculation of low[] in graph.
Tarjan

Finally, if low[v] > disc [u] that means if discovery time of u is less than least ancestor that can be reached from sub tree of v, we have a bridge, because there is no way we can reach to any ancestor of u once we disconnect edge (u,v).

Strongly connected components : Tarjan Algorithm implementation

#define MAX 10000
stack<int> S;
int disc = 0;  //discovery time of vertex
int in_stack[MAX]; //used to check whether element is present in stack

void tarjan_algorithm(int u,vector< vector<int> > g,
                      int index[],int low_index[]){

    index[u] = low_index[u] = ++disc;
    in_stack[u] = 1;
    S.push(u);

    for(int i = 0;i < g[u].size();i++){
        int v = g[u][i]; 
        if(index[v] == -1) {
            //node is not visited yet
            tarjan_algorithm(v,g,index,low_index);
            low_index[u] = min(low_index[v],low_index[u]);
         }
         //back edge(in the tree)
         else if(in_stack[v] == 1){
              //element 'v' is already present in stack 
             low_index[u] = min(low_index[u],index[v]);
         }
      }
      if(index[u] == low_index[u]) {
          //strongly connected component.
          while(S.empty() == false && index[u] == low_index[S.top()] ){
              cout<<S.top()<<" ";
              in_stack[S.top()] = 0;	
              S.pop();
          }
	  cout<<endl;
      }
}

As we see from implementation that only one DFS traversal is done, therefore complexity of Tarjan algorithm to find strongly connected components in a graph will be  O(|V| + |E|), here V and E are total number of vertex and edges present in the graph.

Please share if there is something wrong or missing. If you are interested in contributing to website, please contact us and earn too.