How cliquematch works

cliquematch implements an exact algorithm to find maximum cliques, with a lot of pruning optimizations for large sparse graphs. The algorithm used is similar to FMC [1] and PMC [2], but also uses the concept of bitset-compression from BBMC [3] to save memory. It is implemented in C++11, does not use any additional libraries apart from the C++ STL, and is single-threaded.

Avoiding memory allocations

In earlier versions of cliquematch, the clique search methods were occasionally slowed due to many requests of small amounts from heap memory. In version 2.1, cliquematch does exactly one heap allocation – a std::vector large enough to hold a maximum clique – at the start of every search method, be it the heuristic search, finding one clique or enumerating cliques. During the clique search, it performs zero memory allocations. After a clique is found, it requests memory to return the clique to the user. Thus the number of heap allocations is mostly constant: a set number of allocations to initialize the Graph object (changes slightly depending on size of the graph), zero allocations during the search, and one allocation per clique returned.

This is possible because clique size always has an upper bound (degree, core-number, or the measure used in cliquematch all provide such a bound) and because bitsets are used for representing branches of the clique search. Thus cliquematch computes an upper bound for clique size (ie how deep a search can go), the maximum bitset space required for a vertex ( \(\lceil \frac{d_{max}}{32} \rceil\) ) and allocates space accordingly. The allocated space is reused throughout the search, thus avoiding any heap allocations during the most used part of the program.

Benchmarking the cliquematch algorithm

The below table shows how cliquematch compares to FMC and PMC.

\(name\)

\(V\)

\(E\)

\(w\)

\(t_{cm}\)

\(t_{fmc}\)

\(t_{pmc}\)

\(t_{cm\_heur}\)

\(w_{cm\_heur}\)

\(t_{fmc\_heur}\)

\(w_{fmc\_heur}\)

Erdos02

6927

8472

7

0.0001

0.0015

0.0022

0.0001

6

0.0009

7

Erdos972

5488

7085

7

0.0001

0.0004

0.0016

0.0001

7

0.0002

6

Erdos982

5822

7375

7

0.0001

0.0004

0.0018

0.0

7

0.0002

7

Erdos992

6100

7515

8

0.0001

0.0004

0.0018

0.0

8

0.0002

8

Fault_639

638802

14626683

18

2.0622

13.8244

0.5864

18

2.4625

18

brock200_2

200

9876

12

0.217

0.6421

0.0016

0.0019

10

0.0023

9

c-fat200-5

200

8473

58

0.0023

0.4169

0.0003

0.0001

58

0.0102

58

ca-AstroPh

18772

198110

57

0.0004

0.0789

0.0132

0.0003

56

0.0277

57

ca-CondMat

23133

93497

26

0.0003

0.0048

0.0083

0.0002

26

0.0042

26

ca-GrQc

5242

14496

44

0.0001

0.0005

0.0017

0.0001

43

0.0012

44

ca-HepPh

12008

118521

239

0.0034

0.0131

0.0149

0.0012

239

0.2518

239

ca-HepTh

9877

25998

32

0.0001

0.0007

0.0035

0.0

32

0.0001

32

caidaRouterLevel

192244

609066

17

0.0257

0.2098

0.0774

0.0084

17

0.073

15

coPapersCiteseer

434102

16036720

845

0.0248

0.8344

1.4019

0.0144

845

14.6734

845

com-Youtube

1134890

2987624

17

1.1393

9.618

0.1412

16

0.3515

13

cond-mat-2003

31163

120029

25

0.0007

0.0118

0.0096

0.0004

25

0.0038

25

cti

16840

48232

3

0.0014

0.0036

0.0046

0.0009

3

0.0013

3

hamming6-4

64

704

4

0.0002

0.0003

8.5115

0.0

4

5.6982

4

johnson8-4-4

70

1855

14

0.0405

0.1459

0.0003

0.0001

11

0.0005

14

keller4

171

9435

11

3.4486

15.4211

0.0016

9

0.0027

9

loc-Brightkite

58228

214078

37

5.0651

2.7146

0.0277

0.0038

36

0.0151

31

  • fmc -t 0 -p was used to run the FMC branch-and-bound algorithm.

  • pmc -t 1 -r 1 -a 0 -h 0 -d (single CPU thread, reduce wait time of 1 second, full algorithm, skip heuristic, search in descending order) was used to run the PMC branch-and-bound algorithm.

  • fmc -t 1 was used to run the FMC heuristic algorithm.

  • A small script was used to run the cliquematch algorithm. cliquematch was compiled with BENCHMARKING=1.

The benchmark graphs were taken from the UFSparse [4], SNAP [5], and DIMACS [6] collections. \(|V|\) and \(|E|\) denote the number of nodes and edges in the graph. \(\omega\) denotes the size of the maximum clique found. All the branch-and-bound methods agreed on the maximum clique size in every benchmark. \(t_{cm}\), \(t_{fmc}\), and \(t_{pmc}\) denote the time taken by cliquematch, FMC, and PMC respectively in the branch-and-bound search: the least time is in bold text. \(w_{cm-heur}\), \(t_{cm-heur}\) denote the size of clique and time taken by the heuristic method in cliquematch; and similarly for \(w_{fmc-heur}\) and \(t_{fmc-heur}\). A minus sign (-) indicates that the program returned an error without completing the calculation.

Correspondence Graphs in C++

The correspondence graph classes are generated using simple C++ template programming (simple because it’s basically a typed macro substitution). cliquematch can be extended with custom correspondence graphs: they can be prototyped using the existing classes, and/or implemented in C++ for performance. The source code of IsoGraph or AlignGraph are good places to start if you’re trying to implement a custom correspondence graph class.

References

1

Bharath Pattabiraman, Md Mostofa Ali Patwary, Assefaw H Gebremedhin, Wei-keng Liao, and Alok Choudhary. Fast algorithms for the maximum clique problem on massive graphs with applications to overlapping community detection. Internet Mathematics, 11(4-5):421–448, 2015.

2

Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing, 37(5):C589–C616, 2015.

3

Pablo San Segundo, Diego Rodríguez-Losada, and Agustín Jiménez. An exact bit-parallel algorithm for the maximum clique problem. Computers & Operations Research, 38(2):571–581, 2011.

4

Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1–25, 2011.

5

Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. https://snap.stanford.edu/data, June 2014.

6

David S Johnson and Michael A Trick. Cliques, coloring, and satisfiability: second DIMACS implementation challenge, October 11-13, 1993. Volume 26. American Mathematical Soc., 1996.