ProjectTrustworthy multi-scale manifold learning for genomic and transcriptomic data

Basic data

Title:
Trustworthy multi-scale manifold learning for genomic and transcriptomic data
Duration:
01/01/2022 to 01/01/2025
Abstract / short description:
In recent years, large high-dimensional datasets have become
commonplace in biology. For example, single-cell transcriptomics
routinely produces datasets with sample sizes in hundreds of
thousands of cells and dimensionality in tens of thousands of genes.
Similarly, genomic datasets can encompass hundreds of thousands of
people’s genomes, profiled using millions of single-nucleotide
polymorphisms. One defining feature of such datasets is their
hierarchical organization, with biologically meaningful structure
present on several levels. Such datasets require adequate
computational methods for data analysis, including unsupervised data
exploration, to allow researchers to compactly represent and make
sense of their data. It is commonplace in single-cell transcriptomics to
generate low-dimensional embeddings of the data, using algorithms
such as e.g. t-SNE or UMAP, but the existing methods fall short of
representing the hierarchical structure of the data. Whereas they
excel at preserving local structure, they are unable to recapitulate
larger-scale global structure often present in the data, making it
difficult to interpret the embedding correctly. In this project, our first
aim is to develop a dimensionality reduction method able to preserve
crucial properties of high-dimensional data, such as the local cluster
structure, continuous trajectories, and global hierarchical organization.
The second aim is to develop a suite of quality metrics that will allow
us to benchmark existing and novel algorithms on a range of
challenging datasets. Finally, the third aim is to adapt this machinery
to ultra-high-dimensional data from population genomics. On the
technical level, we are going to rely on the k-nearest-neighbour
graphs and graph coarse-graining. Our work will be useful in practical
applications in biology and bioinformatics, while at the same time
being of high interest for the manifold learning part of the machine
learning community.

Involved staff

Managers

Hertie Institute for Artificial Intelligence in Brain Health (HIAI)
Non-clinical institutes, Faculty of Medicine

Contact persons

Hertie Institute for Artificial Intelligence in Brain Health (HIAI)
Non-clinical institutes, Faculty of Medicine
Institute for Bioinformatics and Medical Informatics (IBMI)
Interfaculty Institutes
Cluster of Excellence: Machine Learning: New Perspectives for Science (CML)
Centers or interfaculty scientific institutions
Tübingen AI Center
Department of Informatics, Faculty of Science

Local organizational units

University Eye Hospital
Center for Ophthalmology
Hospitals and clinical institutes, Faculty of Medicine
Research Center for Ophthalmology
Center for Ophthalmology
Hospitals and clinical institutes, Faculty of Medicine
Werner Reichardt Center for Integrative Neuroscience (CIN)
Centers or interfaculty scientific institutions
University of Tübingen

Funders

Bonn, Nordrhein-Westfalen, Germany
Help

will be deleted permanently. This cannot be undone.