Last update on topic selection procedure: 25.02.2025 - Selection procedure for 2025S is available and starts 25.02.2025.
Open Topics for Bachelor Theses
If you are looking for a Bachelor Thesis topic, please register for the Bachelor Thesis course, either 051065 LP Softwarepraktikum mit Bachelorarbeit (old curriculum) or Group 1 of 051080 LP Softwarepraktikum mit Bachelorarbeit (new curriculum since 2022W). Please look up the list of dedicated topics offered below in Section (A) in the current semester. All those listed topics are available for bachelor theses unless there is a corresponding restriction stated in the topic description. Of course, the topic will be limited in effort and scientific claims to meet the requirements and effort (12 ECTS or 15 ECTS) of a Bachelor Thesis. If you are interested and need to clarify details, do not hesitate to contact us; send an e-mail to Prof. Wolfgang Klas, Prof. Gerald Quirchmayr, or contact a member of the research group.
» Topics for Bachelor Thesis - see the listing in Section (A) below.
Please, make sure you follow the "Instructions: How to get a topic for my SPBA Bachelor Project" given here.
Before contacting us, PLEASE read the » Recommendations & Guidelines for Bachelor Thesis available here.
Open Topics for Master Theses and Practical Courses (PR, P1, P2)
In the following some of the open topics in the area of Multimedia Information Systems are listed. If you are interested and if you have an idea on a project do not hesitate to contact us; send e-mail to Prof. Wolfgang Klas or contact a member of the research group. In case of P1 or P2 projects, please, make sure you follow the "Instructions: How to get a topic for my P1 or P2 Project" given here.
In general, topics in the area of Multimedia Information Systems technologies include:
- analyze, manage, store, create and compose, semantically enrich & play back multimedia content;
- semantically smart multimedia systems;
- security.
Possible application domains include:
- Detecting conflicting information and checking facts on the Web
- Content Authoring and Management Systems
- Multimedia Web Content Management
- Robotic and IoT Applications
- Blockchain Technologies and Applications
- Interactive Display Systems
- Game-based Learning
- Service Oriented Architecture (SOA) and Cloud Based Services
Section (A) below lists topics that can be chosen in the course of a PR Praktikum, but are in principles also available for a master thesis (usually expanded and more advanced).
Section (B) below lists topics that are intended to be chosen for a master thesis.
(A) Topics for Practical Courses (SPBA, PR P1, PR P2)
CL/GQ01: Information Security Policy Repository
Different types of information security policies make it difficult to access relevant passages in individual policies and combine them into an actionable recommendation. A repository in which the policies are stored, and their sections can be accessed in relation to their relevance in each situation, ranging from editing and auditing policies to applying them to incident handling, would therefore be very helpful. To be effective, the developed repository needs to support the information security policy life cycle.
The goal of the project is to develop such a policy repository/store model and implement a prototype. For searching the policy store, AI technologies as well as traditional search mechanisms building on keywords and ontologies should be considered.
The following sources can serve as starting point for this project:
M. Alam and M. U. Bokhari, "Information Security Policy Architecture," International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivakasi, India, 2007, pp. 120-122, doi: 10.1109/ICCIMA.2007.275.
Kenneth J. Knapp, R. Franklin Morris, Thomas E. Marshall, Terry Anthony Byrd, Information security policy: An organizational-level process model, Computers & Security, Volume 28, Issue 7, 2009, Pages 493-508, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2009.07.001.
Nader Sohrabi Safa, Rossouw Von Solms, Steven Furnell, Information security policy compliance model in organizations, Computers & Security, Volume 56, 2016, Pages 70-82, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2015.10.006.
Hanna Paananen, Michael Lapke, Mikko Siponen, State of the art in information security policy development, Computers & Security, Volume 88, 2020, 101608, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2019.101608.
The suggested structure for the paper accompanying the project is:
The suggested structure for the paper accompanying the project is:
- Introduction/Topic description/Motivation
- State of the art in literature and practice
- Modelling method and approach used
- Development of the model
- Prototype (documentation, source code, etc.)
- Test
- Discussion of the results
- Outlook and conclusion
- Tags: Contact: Gerald Quirchmayr, Christian Luidold
CL/GQ02 Threat Intelligence for Cyber Security Decision Making
Cyber Security Decision Making is becoming a core aspect of cyber defense efforts. Advanced decision models and processes, such as the OODA Loop do heavily depend on the available information. The major task of this project is to develop and implement an approach to support the OBSERVE (Information collection) and ORIENT (Information analysis) phases of this type of model.
https://www.airuniversity.af.edu/Portals/10/AUPress/Books/B_0151_Boyd_Discourse_Winning_Losing.PDF
The topic can be split into two parts:
CL/GQ02a: Develop a support approach for the OBSERVE phase based on readily available sources, such as CVEs, NVD, and MISP. The import interface should ideally be based on the STIX/TAXII standard.
CL/GQ02b: Develop a support approach for the ORIENT phase exploring the potential of “emerging patterns” and “weak signals” in network defense. The goal is to monitor internal network traffic and map it on the information collected in the OBSERVE phase.
The following sources can serve as starting point for this project:
https://levelblue.com/blogs/security-essentials/incident-response-methodology-the-ooda-loop
https://cve.mitre.org/; https://nvd.nist.gov/; https://www.misp-project.org/
https://www.oasis-open.org/2021/06/23/stix-v2-1-and-taxii-v2-1-oasis-standards-are-published/
The suggested structure for the paper accompanying the project is:
- Introduction/Topic description/Motivation
- State of the art in literature and practice
- Modelling method and approach used
- Development of the model
- Prototype (documentation, source code, etc.)
- Test
- Discussion of the results
- Outlook and conclusion
- Tags:
- Contact: Gerald Quirchmayr, Christian Luidold
CL/GQ03: A containerized communication model for communication between NIST/CSF phases
The NIST Cyber Security Framework (https://www.nist.gov/cyberframework) has become an established standard for cyber security management. With version 2.0 of this framework introducing a GOVERN function (NIST CSWP 29, The NIST Cybersecurity Framework (CSF) 2.0, February 26, 2024, p. 3), the importance of communication between the functions has increased significantly, as GOVERN addresses an understanding of organizational context; the establishment of cybersecurity strategy and cybersecurity supply chain risk management; roles, responsibilities, and authorities; policy; and the oversight of cybersecurity strategy.
The goal of this project is to develop a communications model with the GOVERN function as a central command and control hub. This model should then be followed by a prototype based on container technology. The focus of the model and protype is the communication between the GOVERN function and the other functions (see figure).
NIST CSWP 29, The NIST Cybersecurity Framework (CSF) 2.0, February 26, 2024, p. 5
The following sources can serve as starting point for this project:
NIST CSWP 29, The NIST Cybersecurity Framework (CSF) 2.0, February 26, 2024, https://www.nist.gov/cyberframework
Use containers to Build, Share and Run your applications: https://www.docker.com/resources/what-container/
The suggested structure for the paper accompanying the project is:
- Introduction/Topic description/Motivation
- State of the art in literature and practice
- Modelling method and approach used
- Development of the model
- Prototype (documentation, source code, etc.)
- Test
- Discussion of the results
- Outlook and conclusion
- Tags:
- Contact: Gerald Quirchmayr, Christian Luidold
PK01 - AI-Powered Shopping Assistant for Enhanced Product Discovery and Customer Service
This project aims to develop an intelligent virtual shopping assistant to improve the online shopping experience. The assistant will focus on two key areas:
- Personalized Product Recommendations: The system will analyze user browsing history, past purchases, and product data to provide tailored recommendations. This will involve exploring collaborative filtering and content-based filtering techniques.
- Intelligent Chatbot for Customer Service: The assistant will feature a conversational interface (chatbot) capable of handling common customer inquiries, such as order tracking, return requests, and product-related questions. This will utilize natural language understanding (NLU) and natural language generation (NLG) techniques.
This project presents the challenge of building a robust and accurate recommendation system and a natural language understanding model capable of handling the nuances of customer service interactions within a specific e-commerce domain (e.g., fashion or electronics).
- Technologies: Python, Scikit-learn, TensorFlow/PyTorch, Transformers (Hugging Face), Flask/Django, React, AWS/Azure/GCP
- Tags:
- Contact: Peter Kalchgruber
PK03 - Streamlining the Exam Process: Developing a Web App for Efficient Exam Creation and Evaluation
Traditional paper-based multiple-choice exams remain a common assessment method in many educational settings. However, the process of manually creating, distributing, collecting, and grading these exams is often time-consuming and prone to errors. Teachers spend significant effort designing exam layouts, printing copies, and then painstakingly evaluate each answer sheet. Manually tabulating scores and generating reports further adds to the administrative burden. Furthermore, the potential for human error during grading and data entry can compromise the accuracy and fairness of the assessment process. A streamlined, automated solution can significantly reduce this workload and improve the overall efficiency and reliability of exam administration.
This project aims to address the challenges of traditional paper-based exams by developing a web-based system for automated exam creation and grading. The application will enable teachers to design multiple-choice exams using a predefined, fixed layout, simplifying the design process and ensuring compatibility with the automated grading system. The system will generate printable PDF versions of the exams for distribution. After students complete the exams, teachers can scan the completed papers, and the system will utilize Optical Mark Recognition (OMR) technology to automatically extract student answers. The OMR component will include preprocessing steps for rotation and skew correction of scanned images. Crucially, the system will incorporate a user-friendly interface for manual review and correction of any ambiguous or unrecognized marks, ensuring high grading accuracy. The final output will provide comprehensive reports, including individual student scores, class averages, and data export capabilities in CSV or Excel format, significantly reducing the administrative burden on educators and improving the efficiency of the assessment process.
- Technologies: React, Node.js (Express.js)/Python/Remix, PostgreSQL, OMR Library, PDF Generation Library
- Tags:
- Contact: Peter Kalchgruber
PK04 - Location Intelligence: An Information System to Intelligently Display the Location of Friends Using Ambient Devices
The ubiquity of mobile devices and location-sharing services has made it increasingly common for individuals to share their whereabouts with friends and family. While this connectivity offers convenience and peace of mind, constantly checking apps or websites for location updates can be disruptive and intrusive. There's a growing need for more subtle and ambient ways to stay informed about the general location of family and friends, without requiring constant attention and interaction. A dedicated, thoughtfully designed ambient display can provide this information at a glance, enhancing awareness and connection without overwhelming the user.
This project aims to design and implement an information system that intelligently displays the location of a user's friends and family using an ambient device. The student will define and build the specific ambient display, which could take various forms, such as a custom-built clock, a modified smart mirror, an interactive lamp, or any other creative physical computing implementation. The system will acquire location data from a chosen source; this could involve integrating with existing location-sharing services, developing a custom mobile application, or utilizing another appropriate method. The system should prioritize displaying location information in a clear, unobtrusive, and aesthetically pleasing manner. The project may optionally explore incorporating basic machine learning techniques to predict future locations based on historical data. The core technologies involved include a Raspberry Pi, Python, a database management system, and a Web API for communication between the location source and the ambient display.
- Technologies: Raspberry Pi, Python, SQLite/PostgreSQL, Web API, Location Sharing API, e.g. Servo Motors (for controlling clock hands)
- Tags:
- Contact: Peter Kalchgruber
PK05 - Intelligent Sports Analytics: Web-Based Platform for Ranking and Analysis
Access to comprehensive sports data and insightful analytics should not be limited to mainstream sports. Many passionate communities participate in less-publicized athletic competitions, but lack the tools to easily track, analyze, and understand their performance. This project aims to empower these communities by providing a user-friendly, accessible platform for collecting, analyzing, and sharing results.
The core goal is to develop a Progressive Web App (PWA) that facilitates the extraction, processing, and analysis of sports data, particularly for sports with limited existing online resources. The PWA should support multiple data ingestion methods, including web scraping (where permissible) and manual input of HTML tables. The system will then analyze this data using a combination of rule-based systems and potentially machine learning techniques to provide various analytical features. These features include selectable ranking algorithms (e.g., Elo, Massey, Colley), trend visualization, result prediction, and the potential for user-contributed data to enhance accuracy. The PWA should be designed to be generic and adaptable to different sports.
[1] Hochbaum, Dorit S. "Ranking sports teams and the inverse equal paths
problem." International Workshop on
Internet
and
Network Economics. Springer, Berlin, Heidelberg, 2006.
[2] Sorensen, Soren P. "An overview of some methods for ranking sports
teams." University of Tennessee.
Knoxville
(2000). (historical overview)
[3] https://www.sofascore.com (Example of a sports data platform - for
inspiration, NOT for scraping)
- Technologies: Web Application, React, Firebase (or Supabase/Node.js with a database like PostgreSQL)
- Tags:
- Contact: Peter Kalchgruber
PK06 - Automated Sports Tracker Utilizing Mobile and Wearable Technology
Manually tracking workouts can be tedious and inaccurate, often relying on users to remember to start and stop recordings or manually input data. Existing solutions may not automatically distinguish between different activity types (running vs. cycling) or provide detailed insights into performance. This project addresses these challenges by developing a progressive web app (PWA) that leverages sensor data from both mobile devices and wearable sports watches (e.g., Garmin) to automatically detect, classify, and record various athletic activities. The system will learn from user-confirmed events by employing machine learning techniques, specifically time-series classification, to continuously improve activity recognition accuracy.
The project aims to develop a functional prototype of the sports tracking PWA. The projects will implement: (1) Data acquisition from mobile and/or wearable device sensors (location, accelerometer, etc.). (2) Basic activity detection (e.g., using rule-based methods or a simple machine learning model). (3) A user interface to display tracked activities, allow manual correction and view basic statistics (distance, time). (4) User authentication and data storage. In addition to the core functionality, the project will include: (1) Implementation and rigorous evaluation of a more sophisticated machine learning model (e.g., a time-series classifier using a recurrent neural network or a Hidden Markov Model) for activity recognition. (2) Integration with a mapping API (e.g., Google Maps) to visualize activity routes. (3) User profile management with customizable activity types and preferences. (4) A comprehensive evaluation of the system's accuracy, performance, and usability.
- Technologies: Web Application, React, Firebase, TensorFlow
- Tags:
- Contact: Peter Kalchgruber
PK07 - Improving Transparency in the Grading Process: Developing a Web Application for Efficient and User-friendly Grade Assessment
If you reflect on your experiences as a student in school, you may recall that the process of grade assessment was not always transparent and sometimes seemed unfair. This topic proposes the development of a web app that enables teachers to efficiently and quickly grade students with minimal effort during their lectures using a simple and user-friendly interface on their mobiles. Additionally, first, a background system should be developed to manage the data. Second, a separate overview page should be designed to give students independent access to their current grading status.
The proposed system may feature basic fields for each student, such as schoolwork, homework, and collaboration. Alternatively, more advanced grading concepts, such as an XP-based grading system, may be developed. Data security and protection measures are to be considered at every time. Overall, the proposed study aims to address the lack of transparency in the grading process by providing a comprehensive and efficient grading system. The developed system is expected to be user-friendly, meet the requirements, and improve the grading process for both teachers and students.
[1] https://blog.haschek.at/xp-based-grading-system/
- Technologies: Web Application, React, Firebase
- Tags:
- Contact: Peter Kalchgruber
PK10 - Entity Resolution for Improved Web Data Comparison Based on Semi-structured Data
The FactCheck framework is designed to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
In order to compare facts about an entity, such as a person, that are published on different websites, the system must first understand that these two representations of the entity are indeed referring to the same person. This can be challenging as data published on the Web often lack a uniform primary key.
This project aims to find a solution for identifying and linking FactSets from different datasets with entities with varying characteristics. The effectiveness of this approach will be evaluated using a range of data sets.
- Technologies: Entity Recognition, Python
- Tags:
- Contact: Peter Kalchgruber
PK16 - Smart Student Support Chatbot
Students often seek timely and accurate information about curriculum details or information about lecturers. This project aims to promote student engagement, support informed decision-making, and provide convenient access to valuable academic resources by developing an AI chatbot that is a knowledgeable, educated companion.
This project uses AI and NLP to develop an intelligent chatbot that responds to student queries about curriculum specifics and faculty information. By integrating the chatbot into a user-friendly interface, students can effortlessly access the information they need, enhancing their academic experience and interactions within the university environment.
- Technologies: NLP, ML, Python, FLask, Chatbot Framework
- Tags:
- Contact: Peter Kalchgruber
WK01 - Jupyter Notebooks for Dedicated Interactive Content of Courses
Jupyter Notebooks allow for the creation and sharing of documents that contain live code, equations,
visualizations,
and narrative text. Jupyter Notebooks are a well-established and well-recog
,,nized tool in academia and education in
general as well as in specific fields of research where it is important to provide for reproducibility of
scientific
results.
Goal of the project is to develop dedicated Jupyter Notebooks for specific course content relevant in the
context of
our courses (MOD, MCM, MST, MRE, MRS). The approach can be based on the existing framework that we already use
for
Juypter Notebooks in some of our courses but may also further improve or suggest new solutions for the framework
as
such. The selection of the programming language to be used needs to meet the requirements of the course content,
most probably Python, but - in fact - is very flexible as Jupyter Notebooks work with a variety of languages.
Mandatory requirement: Student must have understood the course content / material very well and should have
passed
the course already.
- Technology: Juypter Notebook, Python, Jupyter Notebook Hub of the CS-faculty, Markdown, VS Code (or similar IDE)
- Tags:
- Contact: Wolfgang Klas
WK02 - FactCheck - Precision Metrics
FactCheck is a framework for the detection and resolution of conflicting structured data on the Web. The FactCheck framework is the result of ongoing research at our research group. One of the central building blocks is the context-dependent comparison of structured data of various representations of one and the same real world object or artefact. The comparison is guided by so called precision metrics which is a flexible and sophisticated technique for logically comparing structured data values. Precision metrics consist of logical predicates used to evaluate the comparison of structured data. Goal of the project is to design and implement an appropriate model for the representation of precision metrics, the construction of such precision metrics as well as the application of the metrics for evaluating the comparison of data values. Various precision metrics should be defined and compared using a test dataset of 900.000 entities. Results of the project are to be demonstrated by a running demo application.
- Technology: Web Services, Semantic Web technologies, LOD, Microformat, JSON-LD, AI-Tools, Docker
- Provided to the students: existing implementation of framework, test dataset
- Tags:
- Contact: Wolfgang Klas and Daniel Berger
WK03 - Demo of Blockchain Application Using Ethereum
The goal of this project is the implementation of a demo application which illustrates the concept of the consensus technique, e.g., proof-of-stake, Clique (proof-of-authority) (but not the often used "proof-of-work" as, e.g., used in the Bitcoin Blockchain). For example, a possible application could be the implementation of the four-eyes principle (Vier-Augen-Prinzip) for officially approving documents by making use of two signers acting as "proof-of-authorities". There are many other possible application scenarios feasible, e.g., the decision taking principles of a management board of an association or a company, board of managers, board of trustees or directors. The application scenario should be well-chosen in order to illustrate the general principle of proof-of-authority. It may be based on a generic, configurable implementation to show different variations of the proof-of-authority concept, e.g., 1 signer, 2 signers, N signers. The demo application has to be realized such that a short demonstration movie can be recorded, that will be published on the Lab's website.
- Technology: Ethereum, Web technologies, Docker
- Tags:
- Contact: Wolfgang Klas
WK06 - "Studienleistungs & Prüfungspass" Based on Ethereum Blockchain Technology
The goal of this project is - starting out from a given demo implementation - to implement an application for a digital "Studienleistungs & Prüfungspass" (study performance & examination pass) based on blockchain technology. The pass will record the individual, required study achievements (like milestones, tests, etc.) during a course, the final grading of a course, and the collection of gradings of courses during the entire study (like a "Sammelzeugnis" currently used by the university). There are various stakeholders in this scenario: the students, the lectures of courses, and administration (like SPL). The implementation has to be realized based on Ethereum Blockchain technology, which provides the concept of Smart Contracts. Ethereum Smart Contract technology is one of the most promising implementations for smart behavior of blockchain systems. The focus will be on the proper design and implementation of smart contracts to capture most of the functionality of the application.
- Technology: Ethereum Blockchain Infrastructure, on Linux of Windows, or on Cloud Infrastructure, Web-Technologies for implementing Web-based application, Docker.
- Provided to the students: Optionally, virtual machine
- Tags:
- Contact: Wolfgang Klas
WK07 - Securing Images and Videos by Applying Blockchain Technology
The goal of this project is the design and the implementation of a framework based on blockchain technology that allows for the detection of manipulations in images and videos. Images or videos can be manipulated, e.g., persons (or other objects) can be added to or removed from an image, video frames (or sequences of video frames) can be added or removed from a video. Such a manipulation should be detected based on the storage of specific image encoding parts in a blockchain which allows to re-check the validity of an image encoding. E.g., essential macroblocks or portions of some macroblocks of a JPEG-encoded image could be stored in a blockchain such that it can be checked whether an image still consists of those macroblocks or includes manipulated macroblocks. The project will first have to select and specify the kind of manipulations to be considered in the scope of the project, then design an approach and a framework and implement a prototype and a demo application illustrating the approach, based on a specific blockchain platform that suits best the needs of the application.
- Technology: Blockchain Infrastructure (like Ethereum) on Linux, Windows, or on Cloud Infrastructure, JPEG, MPEG, Web-Technologies for implementing Web-based demo application, Docker.
- Provided to the students: Optionally, virtual machine
- Tags:
- Contact: Wolfgang Klas
WK09 - FactCheck - IdaFix Browser-Extension UI for a Chatbot
The FactCheck framework is designed to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
This project aims to find a solution for a user interface which allows an end user visiting a web page to understand the comparison results on conflicting information as well as to provide user feedback to the FactCheck system behaviour. The interface should be realized as an interactive chatbot. The startig point for the project is a prototypically implemented browser extension (IdaFix) which illustrates the functionality as well as the internal system API to be used.
- Technology: Web Browser technologies, e.g., JavaScript, HTML & CSS, Browser APIs (e.g., WebExtensions API), Background Scripts, Content Scripts, Popup Scripts, Messaging APIs, JSON-LD
- Tags:
- Contact: Wolfgang Klas and Marie Aichinger
AH01 - Relation Extraction Web Service - Tool for extracting structured data
For many applications, it is required to extract structured data from unstructured text web pages. A wide variety of NLP approaches claim to be capable of extracting structured data from unstructured text.
Your task for this project is to implement a relation extraction web service using state-of-the-art information extraction models. This web service should accept other web pages as input and output the extracted triplets adhering to a pre-defined schema. Create a web application that is consuming the web service and acts as a front end for the web service.
- Technology: Technologies: Web Application, Web Service, Python, Javascript, Hugging Face Transformers Library, SpaCy Library, PyTorch, Schema.org
- Tags:
- Contact: Adrian Hofer
AH02 - Entity Linking Web Service - Tool for linking entities to knowledge-bases
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects. To enhance data acquisition, your task is to link the extracted information to existing knowledge bases.
Use an existing approach for named entity recognition on text on a web page. Entity link those named entities to WikiData or any other knowledge base like Wikipedia or DbPedia.
Create a web service to entity link named entities on web pages. Understandably visualize the results and make them browsable in a web application. To achieve this, iterate over a requirement analysis and design process to develop a promising implementation.
- Technology: Technologies: Web Application, Web Service, Python, Javascript, Hugging Face Transformers Library, SpaCy Library, PyTorch, Schema.org
- Tags:
- Contact: Adrian Hofer
AH03 - Single Page Conflict Detection
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects. We would like to try this mechanism on a single web page with a comment section
A user on a single web page like a Reddit post, Facebook post, or news article with a comment section cannot grasp how conflict the article and the comments or the comments and the comments are.
Create a browser extension that displays the conflicts on a single page to the user. This browser extension could a. Let the user choose a word or a statement and detect the conflicts for that word or statement. b. Detect all possible conflicts on a single web page
- Technology: Technologies: Web Service, Python, Javascript
- Tags:
- Contact: Adrian Hofer
DB01 - Developing a mapping method to consistently translate between vocabularies
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects.
We define facts as pieces of information that are published by data providers (e.g., as textual content in their website(s)). If two or more websites publish data on the same topic, we humans can compare the data critically. However, this task is quite difficult for a machine, as they do not have an inherent understanding of semantics. Imagine the following example:
You visit website A, which states that Vienna has 1.815.231 inhabitants, while website B states Vienna has approximately 2.000.000 inhabitants. Depending on the context, both numbers can be seen as true or not precise enough. This is a problem, as we cannot tell if the numbers are similar enough or if one of them is too far from the truth, making it a conflict. Now imagine a website C, which states the population of Vienna is 2.000.000. A new problem emerged, as website C offers us the same fact as website A and B, however they are using "population" instead of "inhabitants". As humans, we can tell that we now have two sources that agree, B and C.
However, we cannot make sure that the two websites use the same vocabulary (here, "inhabitants" and "population"). A machine is unable to understand the similarity between these concepts like a human would. Furthermore, websites may use structured data but incorrect properties by mistake. In both cases, our ability to compare facts is inhibited. This topic aims to develop a method that relies on structured data from websites to properly validate and translate between schemata.
- Technology: Python, NLTK, Scikit-learn, Azure Cloud Services, Schema.org
- Tags:
- Contact: Daniel Berger
DB02 - Type-based String comparison methods
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects.
We define facts as pieces of information that are published by data providers (e.g., as textual content in their website(s)). If two or more websites publish data on the same topic, we humans can compare the data critically. However, the comparison of data is not trivial. Imagine the following example:
Website A has information about musicians and states the name "Taylor Swift." Website B about this particular pop musician also states a name, however, as "Tailor Swift." Website C splits the name property into the first name "T." and the last name "Swift."
As humans, we can compare these strings, identify issues, and acknowledge abbreviations and spelling errors. However, for a machine, these things are a tricky challenge. Furthermore, there are other problems that may be faced in string comparison (name-to-nickname comparison, comparison of longer text, capitalization and spelling errors, homonyms,…).
This topic aims to develop a method that can reliably handle string comparisons based on the different schema types available and proves to be highly accurate in the results.
- Technology: Python, NLTK, Scikit-learn, Azure Cloud Services, Schema.org
- Tags:
- Contact: Daniel Berger
DB03 - Semantical analysis of text sequences for comparison purposes
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects.
We define facts as pieces of information that are published by data providers (e.g., as textual content in their website(s)). If two or more websites publish data on the same topic, we humans can compare the data critically. We may encounter sequences of text not only on websites but also in (semi-)structured data. To be able to compare two datasets with long sequences of text, it is important to understand the content of the text and the context thereof.
The task of this project is to analyse text sequences, trying to gather semantic information in the form of a semantic graph. The extracted semantic graphs shall then be compared to find overlaps, conflicts and disjunkt information between texts on the same or similar topic.
- Technology: Python, NLTK, rdflib, Scikit-learn, Azure Cloud Services, Schema.org
- Tags:
- Contact: Daniel Berger
DB04 - Precision Metric design and Framework
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects.
We define facts as pieces of information that are published by data providers (e.g., as textual content in their website(s)). If two or more websites publish data on the same topic, we humans can compare the data critically. However, this task is quite difficult for a machine, as they do not have an inherent understanding of text and its semantics.
A comparison between two data points may appear simple. However, we often encounter objects that contain multiple attributes and/or relationships. To be able to compare these objects, a more complex structure for comparison shall be created. Furthermore, these comparison elements should not need to be written manually. Rather the generation of them shall be supported by a tool.
The project focuses on the generation of "precision metrics", a comparison element, that may be written by experts to compare elements in a domain. A metric may depend on input, context/type, availability functions, as well as other factors. The goal is to create a "precision metric" design, that allows experts to formulate comparison metrics as executable processes. A framework shall be build that is able to generate, manage and store metrics, as well as make them available by means of an API.
- Technology: UML, Python, Azure Cloud Services, Schema.org
- Tags:
- Contact: Daniel Berger
DB05 - Precision Metric design end execution model
FactCheck is a framework for detecting and resolving conflicting data on the Web. It establishes an entire fact comparison process that consists of data acquisition, data comparison, the presentation of comparison results, and comprehensive analysis functions. FactCheck is a leading research topic of our research group and bears challenges in many aspects.
We define facts as pieces of information that are published by data providers (e.g., as textual content in their website(s)). If two or more websites publish data on the same topic, we humans can compare the data critically. However, this task is quite difficult for a machine, as they do not have an inherent understanding of text and its semantics.
A comparison between two data points may appear simple. However, we often encounter objects that contain multiple attributes and/or relationships. To be able to compare these objects, a more complex structure for comparison shall be created. Furthermore, ideas on the execution of these metrics shall be collected and tested (including ASP, binary and other implementations).
The project focuses on the generation of "precision metrics", a comparison element, that may be written by experts to compare elements in a domain. A metric may depend on input, context/type, availability functions, as well as other factors. The goal is to create a "precision metric" design, that allows experts to formulate comparison metrics as executable processes. Furthermore, an execution engine shall be implemented, that is able to read precision metrics and selects a runtime environment or executable codepiece for the process accordingly.
- Technology: UML, Python, Azure Cloud Services, Schema.org
- Tags:
- Contact: Daniel Berger
MSA01 - FactCheck - Chatbots as Interfaces for Fact Exploration
The FactCheck framework aims to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
In recent years, Chatbots have gained widespread attention and are used in various scenarios, including information retrieval. As they allow us to interact with information systems using natural languages, they could serve as intuitive user interfaces for our FactCheck project. Your task is to develop an (NLP/LLM-driven) chatbot for FactCheck using a framework of your choice. The chatbot should allow users to learn about FactCheck and to explore our fact data and the insights gained from them. You may either...
- [P1 only] extend our existing chatbot (React + Azure Bot Service) embedded into the IdaFix browser extension,
- redesign and reimplement the existing IdaFix browser extension, or
- develop a new chatbot for either i) one of our existing FactCheck websites/dashboards or ii) a (Web) service you design yourself.
- Technologies: Chatbots, Azure AI services, Azure Bot Service, React, Web Service, Python, JavaScript
- Tags:
- Contact: Marie Aichinger
MSA02 - FactCheck - IdaFix: Visualize Facts in Webpages
Note: This topic is only available for P1.
The FactCheck framework aims to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
The visualization of facts and comparison results is a vital part of FactCheck. To address this, the IdaFix extension was developed to help users explore and compare the facts embedded into a webpage. This visualization is currently contained in the extension's popup, and there is no way to have facts highlighted directly on the webpage (where they are primarily consumed). Your task is to extend the current implementation of IdaFix to highlight facts directly on the webpage by leveraging the capabilities of content scripts for WebExtensions, and, potentially, NLP.
- Technologies: WebExtensions API, Content Scripts, JavaScript
- Tags:
- Contact: Marie Aichinger
MSA03 - FactCheck - SPARQL for Fact Data
Recommended prerequisite: Multimedia and Semantic Technologies (MST)
The FactCheck framework aims to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
In its current iteration, FactCheck collects fact data from the Web via a Web Extension and dedicated crawlers, and provides results and insights gained from them via a REST-inspired API. Allowing complex semantic queries over our collected data may be beneficial in delivering our results in a semantic-web-friendly way. Your task will be to enable semantic search over our data. First, you will revisit our current document-based data model and redesign it to a triple/graph-based model more closely aligned to OWL/RDF. This may involve finding suitable vocabularies (e.g., RDF-Cube) for our data. Then, you will investigate possible storage solutions (e.g., a triple store such as Apache Jena TBD or RDF4J) for storing the redesigned data. Finally, you will enable semantic search by configuring a SPARQL endpoint and interface.- Technologies: Web Application, Python, rdflib, SPARQL, Apache Jena, RDF4J
- Tags:
- Contact: Marie Aichinger
MSA04 - FactCheck - Design and Deploy A Scalable Statistical Framework for Fact Data
The FactCheck framework aims to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
A core aspect of FactCheck is the generation of statistical insights and metrics from the crawled fact data. Your task is to reimplement our existing statistics API as a scalable stand-alone application using an (ideally Python-based) technology stack of your choice (e.g., PySpark, Pandas, NumPy). The key steps will involve...
- Fact Data Exploration: Explore our existing fact database and familiarize yourself with our data model.
- Fact Data Extraction: Extract thousands of fact data from our existing database to be used as your starting dataset, and adapt the data schema if needed.
- Metrics: Develop new or refine existing metrics from the fact data.
- API Reimplementation: Rebuild the statistics API from the ground up. Optionally, you may also create an interface that showcases its abilities.
- Deploy and Test: Deploy and test your newly developed solution alongside our server using Docker.
If needed, a suitable virtual machine will be provided to you. Depending on your strengths and interests, you may focus on the data science aspects (generation of statistics, data wrangling, etc.) or on creating a frontend (e.g., a dashboard, a Jupyter Notebook) to visualize them.
- Technologies: Python, CouchDB, PySpark, Pandas, NumPy, Jupyter Notebooks, Docker
- Tags:
- Contact: Marie Aichinger
MSA05 - FactCheck - Serious Games for Information Comparison
The FactCheck framework aims to address the issue of conflicting data on the Web by providing a systematic approach to detect and resolve such discrepancies. It encompasses the entire fact comparison process, including data acquisition, comparison, presentation of results, and advanced analysis features. As a pioneering research initiative of our research group, FactCheck presents several challenging aspects and opportunities in its development and implementation.
Serious games, or gamification, refer to applications with a primary purpose beyond entertainment - such as teaching new skills, crowd-sourcing data, or engaging users with a system in new and innovative ways. Your task is to explore the use of serious games and gamification elements for FactCheck. First, you will identify which aspect(s) of FactCheck you would like to gamify, and then design and implement a prototypical serious game using a technology stack of your choice (e.g., as a progressive Web app) which allows the application to run on the Web and communicate with our APIs and databases.
Aspects you may gamify include…
- Entity Resolution and Linking: given our existing fact data and entity resolution results, have users verify existing entity linking results, or perform the linking themselves using an interactive interface
- Fact Data Exploration: given our existing fact data, provide a gamified interface for users to explore and compare facts
- Feedback on Comparison Results: given our comparison API, have users compare data from various websites, and allow them to give feedback tailored towards improving our comparison processes
- Technologies: JavaScript, TypeScript, Angular, React, WebGL, Docker
- Tags:
- Contact: Marie Aichinger
(B) Topics of Master Theses
Please check the listing below for possible topics for a master thesis. In principle, you may also choose from the topics listed in Section (A) above. Those topics are available for a master thesis as well, but usually in a more expanded or advanced form.
- FactChecking: Models and Languages of Precision Metrics for comparing facts on the Internet.
- FactChecking: Flexible, configurable framework for crawlers for extracting facts from web pages.
- FactChecking: AI-based text analysis tools for extracting facts from the Internet.
- FactChecking: Multimedia content (images, audio, video) analysis tools (including the use of Azure AI tools and services) for extracting facts from the Internet.
- FactChecking: Analysis of Cloud-based storage systems/services and design of a storage framework for a FactChecking prototype.
- FactChecking: Analysis and extraction of structured information from videos using state-of-the-art AI technology
- FactChecking: Analysis and extraction of structured information from images using state-of-the-art AI technology
- FactChecking: Analysis and extraction of structured information from text on the Web (news articles, scientific articles, Wikipedia, movie descriptions, etc.) using state-of-the-art AI technology and methods such as named entity recognition, key phrase recognition, and finding linked entities.
- Blockchain-based collection of semantically-correlated statements available on the Web, given by individual persons over time.
- Blockchain-based distributed media content management (e.g., using Blockchain to track images, video).
- Blockchain technology based on a microservice cloud architecture (e.g., following the approach of Edge/Fog Computing).
- Blockchain technology for providing trust in a FactCheck platform (FactCheck is a framework for the detection and resolution of conflicting structured data on the Web).
- Evaluation of platforms of specific Distributed Ledger Technology / Blockchain Technologies that vary in terms of consensus-model, validation-process, privacy-settings, e.g., technology platforms Cardano, Hashgraph, IOTA, Monero, EOS, NEO ([iteratec]).
- Blockchain-based image manipulation detection by using JPEG-specific image encoding information like macroblocks.
- Blockchain-based video manipulation detection by using MPEG-specific video encoding information like macroblocks and motion encoding.
- Enhancing blockchain technology by fast indexing and search/querying functionality using/integrating elastic-search or graph database technology.
- Enhancing blockchain technology by integrating a data model layer that offers a semantically enriched data model (e.g., XML-based, RDF-based, UML-based) to a blockchain application layer.
- Interactive course content components based on Jupyter Notebooks for a dedicated course (e.g., MRE, MRS, MCM, MST, DMP) offered in the Bachelor's or Master's program.
... additional, new topics will become available in near future. In the case of Master Theses topics you may also contact Prof. Klas, Prof. Quirchmayr, or a researcher of the MIS group to find out more about possible topics.