Paweł Szczerbiak, Lukasz M. Szydlowski, Witold Wydmański, P. Douglas Renfrew, Julia Koehler Leman, Tomasz Kosciolek

Advances in protein structure prediction have resulted in a remarkable rise in high-quality 3D models, emphasizing the need for effective computational methods to handle and explore this structural data. Our study explores structural clusters derived from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We generate a unified low-dimensional representation of the protein structure space, revealing that while each database occupies distinct regions, they share significant overlap in functional profiles. Key biological functions cluster within specific regions, indicating a common functional landscape across diverse datasets. To support data exploration, we developed an open-access web server. These findings provide a foundation for further research into protein sequence-structure-function relationships, including taxonomic classification, environmental influences, and functional specificity.

DOI: https://doi.org/10.1101/2024.08.14.607935

Keywords: protein structure prediction3D modelsAlphaFold Protein Structure Database (AFDB)ESMAtlasMicrobiome Immunity Project (MIP)structural clusterslow-dimensional representationfunctional profilesbiological functionsopen-access web serverprotein sequence-structure-function relationships

READ HERE