Google Summer of Code 2018

A Linear Time Implementation of SPQR-Trees

This project is part of GSoC-2018. The objective of this project is to implement in Python the linear time algorithm first proposed by Hopcroft and Tarjan[1] and later corrected by Gutwenger and Mutzel[2] to decompose a (multi)graph into triconnected components and organize these components into a SPQR-tree. The code is to be integrated into SageMath graph library. This project has been conducted under the guidance of David Coudert and Dima Pasechnik.


The Journey

  • Getting started

  • In the initial weeks, to get a hands-on experience on sagemath development, I started working on #25123(Merged) which estimates the closeness centrality of the nodes in a graph.

    Definition: closeness centrality (or closeness) of a node in a connected undirected graph is calculated as the inverse of average of the shortest paths between the node to all other nodes in the graph. Thus the more central a node is, the closer it is to all other nodes.

    It was first defined by Bavelas (1950)[3], on a connected undirected Graph G = (V, E), |V| = n = number of vertices , an inverse of farness, where farness is the inverse of average of the shortest paths from v to u \( \in \) V.

    Algorithm:

    Input: Connected Undirected Graph

    Output: Dict with vertices as key and value as closeness centrality

  1. \(S\) = k random vertices from n vertices of input graph G.

  2. Run BFS k times from the vertices in \(S\), which will give k*n distances.

  3. for each vertex in \(S\), closeness centrality will be actual value, which is

  4. \( C^{-1}(v) = (n-1)/\sum\limits_{u \in V} d_{vu} \)
  5. for remaining (n - k) vertices we need to estimate the closeness centrality which is calculated as:

  6. \( C^{-1}(v) = k/\sum\limits_{u \in S} d_{vu} \)

While implementing this algorithm in Python, we found that there is a scope to improve the reusablility of simple_bfs function, which leads to create the patch #25223(Merged), in which we made simple_bfs function more generic and updated all the functions which are dependent on simple_bfs function. By the end of these two patches, I got good exposure of sagemath code base.

  • Start working on SPQR-Tree

Starting the coding part

We need to decide in which file we are supposed to add the function, this leads to create a new file connectivity.pyx , it includes all the functions from generic_graph.py / graph.py / digraph.py which uses cut vertices/cut edges. Ticket #25436(Merged) which is completed with the help of Meghana M Reddy.

Before beginning with the SPQR-Tree linear implementation part, I worked on finalizing a longstanding open ticket #22157(Merged) on decomposing a graph into 3-vertex-connected components and constructing a SPQR tree. The idea is to use a method based on integer linear programming to find 2 vertex cuts of the graph, then construct the SPQR tree. Even though this method is not the best choice with respect to the computation time, it is not difficult to check the correctness of the code.

The main motivation behind finalizing this ticket was to be able to compare the results of the method 1 & 2 and the correctness.

Below is the brief explanation of method 1.


Method 1: Cleave



Cleave: for a connected unweighted multi-graph G and a vertex cut X, it computes the list of subgraphs of \(G\) induced by each connected component \(c\) of \(G\setminus X\) plus \(X\), i.e., \(G[c\cup X]\)

  • It returns 3 list,

  • 1. S = List of connected component with cut vertices have virtual edges.

  • 2. C = is the graph of the cocycles, set of virtual edges created/added in connected components + one extra copy.

  • 3. f = It's a graph with cut vertices and virtual edges. i.e the complement of the subgraph of ``G`` induced by cut vertices

  • Algorithm:

  • Input: Multi-Graph G, cut_vertex list

  • Output: S, C, f

  1. If cut vertices are given, validate it else create a cut vertex list from graph G.

  2. Find connected components after removing cut vertices from the cut vertex list.

  3. To find virtual edges which are to be added, firstly find a subgraph with cut vertices then construct its complement graph which is f(virtual cut graph), all the edges f has will be added in the connected components,

  4. for all connected components, construct a subgraph with connected component vertices + cut vertices after that add all the edges we found in step 3, this will give us S.

  5. cocycles = no. of edges in f (step 3) * (no. of connected component in (step 4) + 1).

This method is extended with allowing the user to disable adding of virtual edges in connected components, if virtual_edge==false then f is a graph with isolated cut vertices i.e independent set.


Step 2: Construction of SPQR-Tree using Cleave function as helper function.

Definition: The data structure SPQR-tree represents the decomposition of a biconnected graph with respect to its triconnected components.

The vertices of SPQR-Tree could be of 4 types given below.

  1. S (Series) Type: A cyclic graph with atleast 3 vertices.

  2. P (parallel) Type: A multi-graph with 2 vertices and multiple edges between them.

  3. Q (Trivial case) Type: The associated graph has a single real edge. This trivial case is necessary to handle the graph that has only one edge.

  4. R (Rigid) Type: A 3-connected component.

Algorithm:

Input: Multi-Graph G

Output: SPQR-Tree

  1. Find cut vertices of input multi-graph G.

  2. If G has multiple edges, construct a simple graph SG, and store all the multiple edges in counter, if (u, v) has 4 multiple edges between them, then all 4 will be present in counter.

  3. If SG is a cyclic graph then return SPQR-Tree with one vertex of S which has all the edges of SG, and create P type vertices from counter.

  4. If SG is not a cyclic graph,

    create a queue push (SG, cut_vertices)

    while queue is not empty

      B, B_cut = queue.pop()

      S, C, f = cleave(B, cut_vertices=B_cut)

      for each component in S,

        if component is a cycle append in S_block,

        else check if it's a 2-connected block then queue.push(component, it's cut vertices)

        if not 2-connected block then push it to R_blocks list.

  5. create the P_block with the help of virtual edges which are added in S(step 4)

  6. Creation of Tree: Create tree nodes from S_block, R_block and P_block.

      for each edge in counter_multiple_edges:

        P_block = ('P', [e]*num)

         for all blocks in S & R:

           if any block contains this edge, create an edge between Tree.add_edge(p_block, block)

The second most important thing is to verify the output of our function, so we created a function which takes input as SPQR-Tree and outputs Graph H. If H.is_isomorphic(G) ==True then SPQR-Tree is correct.

Algorithm:

Input: SPQR-Tree

Output: A Graph through which input SPQR-Tree is generated.

  1. Make two dictionary namely count_p and count_g. for all vertices in input SPQR-Tree, if the vertex has P or Q label, update count_p with edges and their count. similarly, for S and R label vertices, update count_g dict.

  2. Iterate over all the edges of count_g if that edge is available then num = count_p[edge] - count_g[e] and add num, no. of times edge in H.

  3. Now Iterate over all the edges in count_p and if that edge is not available in count_g, directly add that edge with it's respective count from count_p in H.

  4. Return H

Method 2: Hopcroft_Tarjan[1]

The linear time algorithm of SPQR-Tree is:

Input: Multi-Graph

output: SPQR-Tree

It consists of 4 modules

Module 1: Appending multiple edges from input graph G to respective components (split_multi_egdes()), in this module a simple graph is constructed from the input graph and all the multiple edges are stored in components.

Module 2: Construction of an Acceptable Adjacency Structure, in this module an adjacency structure of the simple graph is constructed using lowpt1 and lowpt2[2] which are found using dfs1 traversal[2].

Module 3: Finding Triconnected Components, in this module first we need to find separation pairs from the split components[2], in which we label the pair which is type-1 or type-2[2]. After finding separation pair find triconnected components using those points.

Module 4: Construction of SPQR-Tree, after finding triconnected components which is the most challenging part, construction of tree becomes easier, connect all the components(tree vertices) using virtual edges, one virtual edge will connect two components(two tree vertices).

Module 1 & 4 are done by me and remaining Module 2 and Module 3 are done by Meghana M Reddy.

The output of the code is tested by spqrtree_to_graph function i.e which is developed in method 1. Method 1 mostly act as verification code for method 2. Method 2 running time is in linear order which is better than method 1.

The complete work of method 2 is done in ticket #25598, it's in the process of getting merge.

Conclusion:

This project is the first open-source python implementation of static SPQR-Tree in linear time, the extension of this work could be the implementation of dynamic SPQR-Tree/cythonizing the implementation of method 2 or it's application on planar graph embedding.

GSoC Experience:

It was a great experience working with SageMath, I learnt how one should code/write the comments such that if anyone is going to use/modify my work should spend the least time in understanding it. Not only this, it gave me an opportunity to explore how actually open source organization works. I am grateful that I got this opportunity. My mentors David Coudert and Dima Pasechnik were very supportive not only in giving guidance but also helpful in coding side.

Acknowledgment:

I would like to take this opportunity to thank my mentors David Coudert and Dima Pasechnik for their invaluable support during the course of the project. The work on patches could not have been possible without their support. At last, I would like to thank GSOC for providing the platform for students to be active in open source development. It is through gsoc that I found out about SageMath and got the opportunity to contribute to it.

Reference:

[1] J. E. Hopcroft, R. E. Tarjan: Dividing a Graph into Triconnected Components. SIAM J. Comput. 2(3): 135-158 (1973)

[2] C. Gutwenger, P. Mutzel: A Linear Time Implementation of SPQR-Trees. Graph Drawing 2000: 77-90

[3] Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck. Computing classic closeness centrality, at scale. CoRR, abs/1409.0035, 2014.