About me

Ole Fenske

PhD student at the University of Rostock

My name is Ole Fenske and I currently work as a PhD student at the University of Rostock. In my studies of business informatics at the University of Rostock, I have specialized over the years in the various topics related to artificial intelligence. Thereby, my in-depth interest is mainly in the areas of machine learning and its learning algorithms. More specifically, I deal with neural networks and try to explore biologically plausible or neuro-inspired learning methods to enable neural networks to learn more efficiently. This includes topics like integrating a priori and expert konwledge into neural networks and investigating in neuro-symbolic models.

As a business information scientist I have additional knowledge in the areas of enterprise modeling, IT management and project management.

As programming languages I am proficient in Python and Java, as I implement and test various algorithms mainly in connection with my research. Besides that, I also do some small web projects on my own, which are more of an occasional pastime.

Research projects

In the context of the international ISEBEL project we investigate in the analysis of a large amount of graph data. In more detail we focus on the scenario of Frequent Subgraph Mining on a single large contiguous graph.
My task was to implement the PaSiGraM algorithm, which I already developed during my bachelor thesis. For the implementation I used Python. First I implemented a version which could be executed on a single computer. After that, I scalled the whole algorithm up to a cluster of computers. In doing so I’ve used the Apache Spark platform to parallelize the single computations onto various computers.
The whole code base can be found here, on my GitHub profile. Just for consideration: the whole algorithm is under development. The base algorithm is working at this point, but there are several improvements to be made, because we further want to optimize the speed and quality of the algorithm. We are also planning to publish a paper which compares PaSiGraM to other state-of-the-art algorithms like GraMi.

The goal of this project was to compare and subsequently implement algorithms for the automatic generation of so-called reference models. ArchiMate models are used as the data basis for the generation. The ArchiMate editor, which can be used for the graphical modeling of different processes models, defines a fixed data format for the exchange of models between different instances of this software. The file format is xml.
The developed solution automatically reads the respective models from the xml file and converts them into a specially implemented graph format. Based on this format, different algorithms, such as RefPa or MCC, can then be used for the automated creation of a reference model. In the process, the resulting models are converted from the graph format back into the xml format defined by ArchiMate, so that they can then be imported directly into the ArchiMate editor.
My task consisted mostly in the implementation of the software, while my project partner dealt with the models and the ArchiMate file format. In addition, we also developed a third (own) approach, in which we combined the already mentioned algorithms RefPa and MCC. The complete project can also be found here on GitHub.

This project was also carried out during my internship at IBM. The goal was to learn from previous user interactions in respective database systems and on this basis propose rules that can increase the quality of the information contained in the database. These rules are so-called automation rules, which are executed by the database itself, so that there is no effort for the user. Until now, these rules had to be defined by the user himself. Thanks to the solution developed, the database can now propose rules itself based on rules that have already been defined.
The system was made available via a REST API so that other software components of the IBM Information Server can address it. The prototype was also integrated into the user interface by another engineering team during my internship, so that the results could even be viewed visually.

This project was also a task I worked on during my five-month internship at IBM. Here the goal was to group/cluster the columns of a database based on their content and then classify them. For this purpose, a predefined similarity metric was used for the columns of the databases. Based on the classification of the columns, different domains can be inferred, which are inherently present in the data of each column. The prerequisite is that the user of the database system has already classified a set of columns by hand, so that the similarity between already classified and unclassified columns can be used to assign the appropriate labels to the remaining columns of the database as well. This mechanism can then be used to increase the quality of the automation rules proposed by the database. The service has also been made available as a REST API to make it consumable by other software components.

This project was developed during my internship at IBM. There, my first task was to analyze a set of historical build verification reports to show the user relationships between the failure of different software modules. Such build verification reports are produced every time a new version of the software is deployed. In the background of this process, automated tests are executed to ensure the correct operation of the software and thus its quality.
The solution I developed analyzes these build verification reports of past deployment processes and takes them as a data basis to perform an association analysis, which as a result shows certain correlations between the failure of different software components. The algorithm can be accessed via a command line interface, so that the analysis can be triggered by e.g. nightly cron jobs.

The task was to implement some benchmark tasks with different Map/Reduce environments for Big Data Analytics and thus compare the systems. The comparisons were to be performed on a server cluster consisting of 4 CPUs with 10 cores each, 512 GB main memory and over 160 TB disk storage and SSDs with 8 TB. For this purpose, suitable platforms had to be selected from the following: Hadoop (Apache), Spark (Apache), Naiad (Microsoft), Flink (Apache), Tensorflow (Google), and PostgresXL (a DBMS with parallel processing). The results should then be put in relation to previous research and evaluations from the US (the so-called Stonebraker benchmarks). For this purpose, data from social networks (Twitter follower graph) should be used. Three test scenarios were developed for this: a grep task, a join task (both defined by Stonebraker), and a self-selected mining algorithm (k-means clustering).
Ultimately, Apache Spark and Apache Flink were chosen as platforms, as well as Hadoop as the basis for testing. I specifically developed the overall framework for the test scenarios and dealt intensively with both the platforms and the benchmark tasks to be implemented. In particular, I used the tools Asana (project management), CentOS, Linux console, Putty, Git, WinScp, JetBrains PyCharm, JetBrains IntelliJ IDEA, Apache Maven, Apache Hadoop, Apache Spark, Apache Flink.
The resulting paper is only available in german and can be downloaded here.

Awards

B.Sc. Ole Fenske (Institute of Computer Science, supervisor Prof. Dr. rer. nat. habil. Andreas Heuer, Dr.-Ing. Holger Meyer) will receive the INFO.RO Sponsorship Award for the best bachelor’s thesis in the 2017/2018 academic year on December 20, 2018, for his work on “Parallel Graph Mining Techniques for the Evaluation of Hypergraph Structures” in the bachelor’s program in Information Systems.

The thesis is in the area of Digital Humanities and is specifically dedicated to graph mining in the context of the international project ISEBEL. This investigates a composite of several narrative databases using text and graph mining techniques. Mr. Fenske has investigated the following issues in the context of this work: How can the contents of the narrative databases be represented as graph structures? Which mining techniques can be applied to the hypergraph structures used? What does a method look like that implements current cluster architectures and frameworks for efficient, parallel searching for frequent patterns? Mr. Fenske has not only investigated the state-of-the-art of Frequent Subgraph Mining (FSM) in detail, but has also found solutions for the subproblems of candidate representation and generation and developed a parallelized algorithm (PaSiGraM).

The prize has been awarded by the association Informatik-Forum Rostock e.V. (INFO.RO) annually since 2012 for an outstanding bachelor’s thesis written at the Institute of Computer Science for students of the Faculty of Computer Science and Electrical Engineering. The prize is endowed with 100 euros.

Link to the original article

During the two-day hackathon, which attracted academics from around the world, participants learned about building complex real-time data analytics pipelines based on open-source cluster computing frameworks (i.e., Apache Flink and Apache Spark). In the open coding session, participants implemented their ideas using the frameworks. The challenge was to select one or more real-time data sources, either provided by the examples or self-selected data sources, and create real-time analytics that improve strategic planning or decision making.

The participants analyzed the virtual company INNOGAME AG in project teams. The goals were to identify challenges, workflows, IT systems used, etc., to determine any necessary changes, and to develop measures for the transformation of the company. The teams were able to apply the knowledge acquired in the course as well as the “4EM method” of enterprise modeling to capture and analyze the various components and interrelationships of the enterprise architecture. Another component of the competition was the presentation and defense of the developed transformation strategy in front of the decision-making body of the company, which was represented here by the competition jury.

Link to the original article

Curriculum vitae

  • Mar. 2022 - Today

    University of Rostock

    PhD student at the chair of Mobile Multimedia Information Systems.
    Investigating into neuro-symbolic AI and the integration of A-Priori/expert knowledge into neural networks.

     

  • Mar. 2021 - Feb. 2022

    Center for Artificial Intelligence in MV

    Research scientist for the creation of AI prototypes, supervision of student theses and consulting of SMEs in the Mecklenburg area on the topics related to AI.

  • Oct. 2018 - Dec. 2020

    Master of Science Business Informatics at the University of Rostock

    Specialization in the field of machine learning, especially in the topics related to neural networks (NLP, neuroevolution, image processing, etc.).

    Final grade: 1.0

     

  • Oct. 2018 - Dec. 2020

    Research Assistant at the Chair of Information and Database Systems

    Supervision of student research projects, implementation of a frequent-subgraph mining framework within the international ISEBEL research project.

  • May 2018 - Sept. 2018

    IBM Deutschland Research & Development GmbH

    Machine learning and workflow management intern.

  • Apr. 2017 - Apr. 2018

    Student Assistant at the Chair of Information and Database Systems

    Research projects in the area of Big Data analytics and machine learning.

    Platforms used: IBM Bluemix, Apache Flink, Apache Spark

     

  • Okt. 2016 - Feb. 2017

    Academic tutor at the chair of business informatics

    Module "Fundamentals of Business Informatics":

    - Programming with Python

    - Entity-Relationship Modeling

    - Basics of databases and SQL

     

  • 2014-2018

    Bachelor of Science Business Informatics at the University of Rostock

    Specialization in the topics of databases and artificial intelligence.