95. Leveraging protein structure information and deep learning to functionally annotate the microbiome

95. Leveraging protein structure information and deep learning to functionally annotate the microbiome

Tomasz Kościółek – Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland

Abstract

Microbiome is a diverse, dynamic and individualized ecosystem important to human health. Through the decades of studies and technology development in DNA sequencing the research community was able to thoroughly characterize the microbiome across populations. However, the field is held back from turning this knowledge into actionable therapies due to the paucity of functional information to understand of how microbiome impacts the host. Current homology-based techniques allow for up to 50% of gut microbiome genes to be annotated. We bypass this limitation by relying on molecular biology principles, ie. protein sequence-structure-function relationships and the use of deep learning, as implemented in deepFRI. By using language models and graph convolutional networks we are able to predict functional categories with high accuracy and recall irrespective of the proximity to known well-annotated model organisms (e.g. Escherichia coli). On real microbiome data, we demonstrate that the deep learning-based approach is highly concordant with current state-of-the-art while providing >90% coverage. We expect to further apply this approach on all available microbiome metagenomics data. Thus, we will create a comprehensive functional picture of the human gut microbiome and open up the field to biomarker discovery and to personalized microbiome-based medicine.

About the author

Dr. Tomasz Kościółek is the head of the Structural and Functional Genomics Group, established in 2019. In 2016, Dr. Kościółek received his PhD at University College London and in 2016-2019 he was a postdoctoral fellow in the group of Rob Knight (University of California San Diego, USA). There, he developed bioinformatics methods supporting the analysis of microbiome data. Currently, the main area of research of his research group is the human gut microbiome and the development of bioinformatics methods to describe the biological functions performed by the microbiome and its changes over time. The aim of the group is to develop biomarkers that allow for the early diagnosis of diseases associated with changes in the composition of the microbiome (e.g. Crohn’s disease) and strategies for changing the composition of the microbiome to promote health. So far, it has obtained funding from NAWA, NCN and NCBiR programs.