Graphs : Basics and representation

Graphs

A graph is collection of edges E {e1,e2,3…..en} and vertices V = {v1,v2,v3…vn}. Mathematically, graph is represented as  G = (V,E) An edge is pair of vertices (vi, vj) where i,j <= n

There are many applications of graph in computer science, like representing connection of roads, flight paths, internet etc.

Undirected and directed graphs
Graphs can be undirected or directed. A directed graph is where edge is one way from one vertex to another. While undirected graph has two way edges, that is, there is no arrow head at the end of edge.

Number of edges
A directed graph can have at most N^2 edges, one of each pair, while undirected graph will have C(n+1,2) edges. Sparse graph is a graph where number of edges is less than above given number.

Paths
Paths  in a graph is a sequence of vertices {v1,v2….Vk} such that there exists an edge between consecutive vertices. Formally, we can say that v1,v2,…vk is a path if (vi, vi+1) belongs to edges.

In-degree and out-degree
In-degree is number of edges which are ending on a given vertex. Out-degree is number of edges going out of given vertex.

Let’s see how above terminology applies for below graph.

Graph

Vertex list v = {1,2,3,4,5,6}
Edge list e = {(2,1)(2,6)(1,4)(4,3)(3,5)(3,6)(5,6)}
Path p = {1,4,3,5,6}
In degree of 3 is 1, out degree of 3 is 2

Graphs representations

There are two methods by which we can represent graphs.


1. Adjacency matrix
In this representation, we use two dimensional matrix. Matrix(i,j) is set to true if there is an edge between vertex i and vertex j.

Advantage of this representation is that is very simple to implement. However, when number of edges between vertices are sparse, lot of cells remain vacant.
Complexity of graph algorithms is measured in terms of E and V where E is number of edges and V is number of vertices.
In this representation memory used to represent graph is O(V^2). If graph is undirected then when there is an edge between (u,v), there is also an edge between (v,u). So transpose of adjacency matrix is same as original. So we can save half the space in case we are representing an undirected graph using adjacency matrix. Moreover, if there are no weighted edges in graph, instead of using one byte, we can use one bit to indicate edge.

Above graph can be represented as

Adjacency matrix represnetation

Code

2. Adjacency list
In this representation, we have a table of all vertices of graph. Each entry in that table will have list of vertices which it is connected to. It is useful where graph is sparse.
Memory required for this representation of graph is O(V +E ).

Adjacency list representation can be easily extended to represent graphs with weighted edges. Node contains another parameter wt. For edge (u.v)  node in adjacency list of u will have weight of edge.

Above graph can be represented in adjacency list as

Adjacency list representation of graph

Code

Find an element in row and column sorted matrix

Problem statement

Given a row and column sorted matrix, find an element in it.

Example of row and column sorted matrix is as follows.
[1,2,3,4,5
6,7,8,9,10
11,1,2,13,14,15
16,17,18,19,20
21,22,23,24,25]

Analysis

Since row and column are individually sorted, we can start from the leftmost bottom corner and see if the element to be searched is less than or greater than element at leftmost bottom cell element. If it is less then we move up in the same column, and if it is greater than we move right in same row.

Complexity of this algorithm is O(N+M) where N is number of rows and M is number of columns.

For further details on above algorithm, please refer to this post :  Binary search and related problems
Now we need to figure out if we can do better than O(N+M)?  Can we discard any part of the matrix at any given point of time?
If we take the middle element of matrix, what can be said about four quadrant middle element divides it into?

Left top quadrant contains all elements which are less than the middle element for sure.
Right top contains less than as well as greater than middle element. No use.
Right bottom contains all elements which are greater than middle element.
Left bottom quadrant contains all elements which are either less or greater than middle element. No use.

So, there are two quadrant which contain either all less than or all greater than middle element. So, if based on relation ship between element being searched and middle element, we can eliminate one of the two quadrants. So we are reducing our problem to 3/4th of the original problem.

Case 1 :
If middle element is less than element to be searched, then we need to search in all quadrants which may contain elements greater than middle. Discard the quadrant which contains all elements less than middle.

If element searched is less than middle, yellow portion to be looked into

Case 2:
If middle element is greater than element being searched, then we need to search in all quadrant which may contain  elements less than middle element. Discard quadrant which contains all elements greater than middle.

If element searched is greater than middle, yellow potion to be searched

If we look closely at above cases, top right quadrant is always searched.

Code

Complexity Analysis

Complexity of above code is less than O(N^1.58). Every time we will have 3 N/2 size sub matrices.

Print unique rows in a boolean matrix

Problem statement

Given a matrix of 0 and 1, print all unique rows of it.
For example
Matrix  =  0 1 0 0 1
               1 0 1 1 1
               0 1 0 0 1
               1 0 1 0 0
Output should be
 0 1 0 0 1
 1 0 1 1 1
 1 0 1 0 0

Analysis

First solution which has complexity of O(N^2 x M) where is N is number of rows and M is number of columns is as follows.
For each row in matrix, check all rows if it matches with anyone of them. If yes, skip printing, else print the row.
Other way which is better than above approach is to convert each row in equivalent decimal and then check for duplicates. Again it has complexity of O(N^2 ) where N is number of rows and M as number of columns.
Can we do better than this? There is standard technique to check if a particular pattern is already seen or not. That’s using tries. Each pattern is added to trie, when we try to add a duplicate pattern, we will end up at the leaf node which is already marked as leaf due to earlier pattern.
With given info, we can say that we will insert each row pattern in trie. We will modify insert function of trie to return us whether the row is added newly or it was already present in trie. As explained above it can ascertained by the fact that if last node is marked as leaf node already, then it is duplicate row as we must have traveled same nodes above. If the last node is not leaf node already, then there is at least one digit which is different and hence row becomes unique. Mark last node of this pattern as leaf node, so that same patterns will be detected as duplicates.

Algorithm

  1. Add row in trie.
  2. If the last node while entering the pattern is leaf node, return false. It’s not unique row.
  3. If last node is not leaf node, mark it as leaf node and return true.
  4. If insert operation return true, print the row.
  5. Else, skip printing row.

Code

Complexity Analysis

Complexity of trie based solution is O(N x M).

Arrays : Merge overlapping intervals

Merge overlapping intervals

Given N intervals S = {e1,e2,…..en} each event has start and end time. Merge all overlapping intervals.
Events i and j overlap when start time of jth event is less than end time of ith event or vice-a-versa
For example, [(1, 3), (2, 4), (5, 8)] should be transformed into [(1, 4), (5,8)]

Analysis

This problem is quite simple but through this problem we can understand concept of stacks and array.

Let’s see brute force solution. We take ith event and compare start time of every jth event with end time of ith, if start time of jth event is less than end time of ith event and end time of jth event is greater than ith event then update the end time of ith event.

This solution has complexity of O(N^2).Can we do better?

How about sorting all events based on start time. Now if start time of ith event is greater than i-1 th event’s end time, than it will be sure that start time of i+1th event will be greater than i-1th event. hence we will need to compare current event with just previous event and update the end time of previous event in case of overlap.

We can see that as we need the last visited event every time we visit new event, stack will be the best data structure to be used.  Perform following steps:
  1. Consider event e.
  2. If stack is empty, push e to stack.
  3. If stack is not empty, then pop the event at top of stack call it e1.
  4. Compare start time of e with end time of event e1. 
  5. If start time of e is less than end time of e1, and end time of e is greater than end time of e1, update end time of e1 
  6. Push e1 back to stack.
  7. If condition 5 is not true, push e to stack. 
  8. Continue till all events are considered.

Code for merging overlapping intervals

Complexity Analysis

Complexity of algorithm to merge overlapping intervals will be O(N log N) due to sorting with O(N) extra space.

Strings : Check if string in interleaved of two strings.

Problem statement

Given string A,B and C, find if string C is interleaved of A and B.
C is said to be interleaved if it contains all characters of A and B and order of characters in respective string is maintained. For example string C in figure is interleaved string of A and B
Interleaved strings

Analysis

Simple algorithm goes like :
Let’s say we start with first character of C,A and B. Consider length of C as length of A + length of B. If it is not true return false (why?).
Now check first character of C and A. If they match, move to second character of C and A. Keep B at first character.
If above condition is not true, check if first character of C and B match, then move to second character C and B. A remains at first character.
Once done, again do above steps with new first characters of strings, while character in C matches character in A or B.
If both above conditions are false, return false.
Now conditions problem reduces to smaller problem with C with length N-1, one of A or B with M-1.
From above description, we can figure out that recursion can be used to solve this problem.
isInterleaved(A,B,C)  = isInterleaved(A+1, B, C+1) // If character of C matches with character of A
                                 || isInterleaved(A, B+1,C+1) // If character of C matches with character of B
What shall be the base case?
If we reach at the end of C, we have considered all characters, we can return true if all characters in other two strings are considered. If not returned false.

Code

Iterative implementation

Iterative implementation will not work with input where there are common characters in string A and B, for example A = XXY and B = XXZ and if C = XXZXXY will return false

Complexity Analysis

Complexity of above code will be linear O(N) N being length of string C, where as complexity of recursive solution will b O(2^N) but it does not fail in above mentioned case.

Dynamic programming approach

If we look closely, we can see that there are many sub problems which are being calculated again and again. Let’s look at recursion tree for input A = XXY and B = XXZ and C = XXZXXY
Recursion tree for sub problems

We get the idea that we need to store result of smaller sun problems, so that we do not calculate it again and again.

We create a two dimensional table. Table(i,j) = true only if C[i+j-1] is interleaved string if A[i] and B[j].
Empty string is interleaved of two other strings so,
Table[0][0] = true

If one of the strings was empty:
Table(i,0) = A[i] == C[i] && Table(i-1, 0) that is to say if till i-1 characters C was interleaved of A, then for ith character it will be true if ith character matches ith character of A. Note that B is null here
Again if string A is empty, then Table(0,j) = Table(0, j-1) . With same argument above.

With these base cases, we can fill table bottom up as follows
Table(i,j) = Table(i-1,j)  if (A[i] == C[i+j]) && (B[j] != C[i+j])
Table(i,j) = Table(i,j-1) (B[i] == C[i+j])&& (A[i] != C[i+j])

Table(i,j) = Table(i-1,j) || Table(i, j-1) if (A[i] == C[i+j]) && (B[j] == C[i+j])

Code

Complexity of above code will be O(N^2).

Binary Search Tree : Connect nodes at same level

Problem statement

Given a binary search tree with each node having an extra pointer next. Next should point to the node which is on the right side of the node and at the same level. Right most node points to NULL.

For example for below input tree

Input binary search tree

output should be

Output binary search tree

Analysis

We are dealing with all nodes on same level. So first thing which comes to mind is level order printing.

To understand level order printing, refer Binary Search Tree : Level order printing 

Now since rightmost node is the termination of the list formed by connecting nodes at same level, we need to first reach rightmost node at each level. We will be following the push approach to create linked list. Once we are at rightmost node, mark next pointer of that node as NULL.

Now return the pointer of this node and next of node on left side at same level will point to this node and so on till we reach the leftmost node at the level.

So we are doing two things. First find nodes at same level and then push those nodes in linked list one by one.
Finding nodes at same level is simple as explained in above link.Only difference will be that instead of traversing left to right, we will traverse right to left.
So when we find a node we just need to push that in the linked list, for which we need the current head of the linked list created till now.
Once we have that, we can just point next of this node to current head and return node which will serve as head of list for other nodes.

Same problem can be extrapolated to ask connect all leaf nodes of a binary tree using extra next pointer. In this case only the base case will change and we don’t required to go to all levels. Just the last level will do. Even that can be optimized by not calculating the height in advance and checking each node if that is leaf node, if yes push that onto linked list.

Code


Complexity Analysis 

Worst case complexity of above code will be O(N^2) where N is number of node, when tree is completely skewed.

Dynamic programming : Contiguous sub array with largest sum

Largest sum contiguous subarray (Kadane algorithm)

Given an array of integers, find a subarray with largest sum. This problem is solved using Kadane’s algorithm.
For example, for array {-1, 3, -5, 4, 6, -1, 2, -7, 13, -3} maximum sum sub array will be {4,6,-1,2,-7,13} with sum = 17

Analysis 

Largest sum subarray can be find using dynamic programming approach,  At every number in array, we need to take decision on two possibilities:
1. Number is included in already existing sub array and it increases the sum.
2. Number is not included as it makes the sum negative and adding this will make all subsequent addition less than what we already have.

To calculate sum till index i we can use sum we have already calculated till i-1 index.
If adding element at i makes the sum negative, we would drop the element, as having this element in the sub array will only decrease the already existing sum as well as sum all the elements subsequent of this element. Hence we would start afresh from i+1 element.

Even if the sum is greater than zero, we need to check if it greater than the sum till i-1 index. If it is then ith element is added in the sub array and rightmost index of sub array becomes i. 
If  sum is less than max sum we have seen til i-1 index then we don’t change the rightmost index of sub array.
If addition of ith element make sum to be negative, we make current sum as zero and start all over again with i+1 index. 

Above algorithm does not work with all numbers being negative. What we can do is that scan all elements of array prior to application of this algorithm and find out if there is at least one non negative number. Also during this phase keep track of the largest number we have encountered. If all elements are negative, just return largest number.

Code for Kadane algorithm

Complexity analysis

Complexity of finding largest sum subarray in array is O(N) in time and O(1) in space.