ProjektGraViLa – Graphs without Labels: Multimodal Structure Learning without Human Supervision

Grunddaten

Akronym:
GraViLa
Titel:
Graphs without Labels: Multimodal Structure Learning without Human Supervision
Laufzeit:
01.08.2024 bis 31.12.2028
Abstract / Kurz- beschreibung:
Multimodal learning focuses on training models with data in more than one modality, such as videos capturing visual and audio information or documents containing image and text. Current approaches use such data to train large-scale deep learning models without human supervision by sampling pair-wise data e.g., an image-text pair from a website and train the network e.g., to identify matching vs. not matching pairs to learn better representations.
We argue that multimodal learning can do more: by combining information from different sources, multimodal models capture cross-modal semantic entities, and as most multimodal documents are a collection of connected modalities and topics, multimodal models should allow us to capture the inherent high-level topology of such data. The goal of the following project is therefore to learn semantic structures from multimodal data to capture long-range concepts and relations in multimodal data via multimodal and self-supervision learning without human annotation. We will represent this information in form of a graph, considering latent semantic concepts as nodes and their connectivity as edges. Based on this structure, we will extend current unimodal approaches to capture and process data from different modalities in a single structure. Finally, we will explore the challenges and opportunities of the proposed idea with respect to their impact on two main challenges in machine learning: data-efficient learning and fairness in label-free learning.
By bridging the gap between those two parallel trends, multimodal supervision and graph-based representations, we combine their strengths of generating and processing topological data, which will not only allow to build new applications and tools but also opens new ways of processing and understanding multimodal data and concepts at a scale that is out-of-reach at the moment.

Beteiligte Mitarbeiter/innen

Leiter/innen

Fachbereich Informatik
Mathematisch-Naturwissenschaftliche Fakultät

Weitere Mitarbeiter/innen

Fachbereich Informatik
Mathematisch-Naturwissenschaftliche Fakultät

Lokale Einrichtungen

Tübingen AI Center
Fachbereich Informatik
Mathematisch-Naturwissenschaftliche Fakultät

Geldgeber

Hilfe

wird permanent gelöscht. Dies kann nicht rückgängig gemacht werden.