The purpose of this article is to briefly present what the content of my bachelor thesis was. Detailed considerations will not be given in this article. Instead, the individual parts of the bachelor thesis are briefly presented. For more detailed explanations of the individual parts, the interested reader can download the complete thesis from the servers of the University of Rostock. At the end of this article the corresponding link can be found.
Graph Mining for Hypergraphs
My bachelor thesis is generally about how to use current graph mining techniques to analyze hypergraphs. For this purpose, I first investigated how a hypergraph can be transformed into a graph structure. Afterwards, the thesis deals with the field of graph mining and introduces the different subfields like clustering and classification before the subfield of Frequent Subgraph Mining (FSM for short) is discussed in more detail.
Frequent Subgraph Mining
This subfield of graph mining is concerned with finding (automatically) frequently recurring patterns in one or more graphs. A distinction can be made between the scenarios of a graph set and a single graph for which the analysis is to be performed. The thesis focuses on the scenario of a single graph and considers in this context several relevant problems, such as the isomorphism of graphs, the so-called candidate generation or the significance calculation. In the corresponding chapter, GraMi, a concrete algorithm for the FSM, is also discussed and explained in detail.
Parallelization Platforms
Another goal of the work is to investigate how FSM algorithms can be parallelized. In this context, the two frameworks Apache Spark and Apache Flink were examined in more detail. These are so-called cluster computing frameworks. This means that with the help of these technologies, certain calculations can be divided among several computers that can communicate with each other in a network. This enables the user of such frameworks to parallelize his calculations on several computers, so that certain components of the program can be calculated simultaneously on these computers. This can lead under certain circumstances to a high time saving.
PaSiGraM – Parallel Single Graph Mining
In a further chapter of the bachelor thesis, it is investigated which FSM algorithms are suitable for parallelization or which concrete algorithms have already been parallelized using the above-mentioned frameworks. It was found that there is no algorithm that can enable parallel computation on a computer cluster in the context of FSM on a single graph.
This was the motivation to develop PaSiGraM, an algorithm that solves the individual problems of the FSM and can be computed in parallel on a cluster consisting of any number of computers.
The bachelor thesis can be downloaded here (in German) from the servers of the University of Rostock.