Symbol picture: Colourbox.
“In times of 'big data' and mass data processing, data quality is a major issue," says Klessinger. His research focus is on the automatic detection of dependencies in semi-structured data. These dependencies can be used to describe the structure of data more accurately than previous approaches (in what is called a "scheme"). A more precise scheme is also capable of facilitating the work of data consumers (software developers, for example) by giving them a more accurate idea of what the data look like.
If your description of the dependencies or the structure of processed data is too narrow, new data that are actually valid may be identified as faulty. However, if the description lacks precision, data that are actually faulty will not be recognised as such
Stefan Klessinger, Research Assistant at the Chair of Scalable Database Systems
Since October 2021, Klessinger has been working in both international and national teams at the Chair of Scalable Database Systems held by Professor Stefanie Scherzinger who herself researches semi-structured data. "On account of the diversity in these research groups, there are lots of exciting ideas," says Klessinger, who has been working on his current research topic for about one year now. "This has given rise to various points of departure. The discussions in the teams, and also during international conference trips, are inspiring and motivating."
His research focus combines two thematic areas which have so far been researched independently of each other for the most part: automatic structure recognition in semi-structured data, on the one hand, and automatic detection of dependencies (on structured data), on the other. He explains that a major difficulty in both subject areas is that the structure or the dependencies need to be adequately described but not in too great a detail. Frequently, automatically detected dependencies are only randomly valid for the data under consideration and can be compromised by including additional data. Likewise, the structure of different data from the same dataset may vary, which often provides a strong incentive to draw up a meaningful abstraction of the identified structure "If your description of the dependencies or the structure of processed data is too narrow, new data that are actually valid may be identified as faulty. However, if the description lacks precision, data that are actually faulty will not be recognised as such."
An example to illustrate this point
A dataset describes people using what are called "attributes", including first name, second name, surname, date of birth and generation. Current approaches are focused on recognising that the "second name", for instance, is not always available or that the "date of birth" is a number whereas the other attributes consist of a character string of letters. Klessinger's research is about formulating more precise descriptions using what are called “dependencies”. If the date of birth is shown as 2000, for instance, such a level of precision would make clear that the data relate to "generation Z".
This research earned Klessinger first place in the "Student Research Competition" at this year's 'International Conference on Management of Data" (SIGMOD) , which took place in Seattle in June and is regarded as one of the most important international conferences on databases. Chair holder Professor Stefanie Scherzinger expressed her delight: "It's the second time in a row that a staff member of the Chair makes it to the final round of the ACM SIGMOD Student Research Contest. I am really very pleased that Mr Klessinger won the competition this year."
About Stefan Klessinger
Stefan Klessinger has been studying at the University of Passau since 2013. After earning his bachelor's degree in internet computing in 2019, he completed his master's degree in computer science. He has been working as research assistant at the Chair of Scalable Database Systems held by Professor Stefanie Scherzinger since October 2021.