189. Interpretable protein function predictions with deepFRI2

Paweł Szczerbiak, Postdoctoral Researcher in Structural and Functional Genomics Group, Sano, PL

Abstract:

Protein function prediction remains a key challenge in biology due to the rapid growth of sequence and structure data enabled by high-throughput technologies and recent advances in protein structure prediction. In my talk I will introduce deepFRI2, an upgraded framework of deepFRI (Deep Functional Residue Identification). Like its predecessor, deepFRI2 operates in two complementary modes: sequence-based and sequence+structure-based. Architecturally, the model consists of two main components: a structural prober, which leverages shallow convolutions over distograms for improved interpretability, and a sequence analyzer, powered by protein language model with lightweight attention, to capture evolutionary and sequence-derived signals. Guided by principles of simplicity and robustness, deepFRI2 uses only a few million parameters yet achieves strong performance across all evaluated benchmarks, improving upon deepFRI and reaching state-of-the-art results. At the same time, it preserves a high level of interpretability and scalability — two often overlooked, but crucial features.

About the author:

Paweł started his journey in science by studying computational and medical physics at the Lodz University of Technology. His BSc thesis was devoted to semiconductor laser modelling. He completed his MSc and PhD degrees in theoretical physics at the University of Warsaw, where his research focused on cosmology and particle physics. His scientific interests underwent a tectonic shift in 2019, when he joined the Structural and Functional Genomics Laboratory at the Małopolska Centre of Biotechnology as a postdoctoral research associate. Since 2024, he has been a postdoctoral researcher in Tomasz Kościółek’s research group at Sano – Centre for Computational Medicine. His current research focuses on methods for protein structure and function prediction, with particular emphasis on their computational aspects. More generally, he is interested in applying mathematical and machine learning techniques in science.