53. High Performance Statistical and Data Mining Analysis of Omics Data: Experiences at University Magna Graecia of Catanzaro

53. High Performance Statistical and Data Mining Analysis of Omics Data: Experiences at University Magna Graecia of Catanzaro

Mario Cannataro – Data Analytics Research Center & Department of Medical and Surgical Sciences University “Magna Græcia” of Catanzaro, Italy

Abstract

Omics sciences (e.g. genomics, proteomics, and interactomics) are gaining an increasing interest in the scientific community due to the availability of novel, high throughput platforms for the investigation of the cell machinery, and have a central role in the so called P4 (predictive, preventive, personalized and participatory) medicine and in particular in cancer research. High-throughput experimental platforms and clinical diagnostic tools, such as next generation sequencing, microarray, mass spectrometry, and medical imaging, are producing overwhelming volumes of molecular and clinical data and the storage, integration, and analysis of such data is today the main bottleneck of bioinformatics pipelines.

This Big Data trend in bioinformatics, poses new challenges both for the efficient storage and integration of the data and for their efficient preprocessing and analysis. Thus, managing omics and clinical data requires both support and spaces for data storing as well as algorithms and software pipelines for data preprocessing, integration, analysis, and sharing. Moreover, as it is already happening in several application fields, the service-oriented model enabled by the Cloud is more and more spreading in bioinformatics.

Parallel Computing offers the computational power to face this Big Data trend, while Cloud Computing is a key technology to hide the complexity of computing infrastructures, to reduce the cost of the data analysis task, and to change the overall model of biomedical and bioinformatics research towards a service-oriented model.

The talk introduces main omics data (e.g. gene expression and SNPs, mass spectra, protein-protein interactions) and discusses some parallel and distributed bioinformatics tools and their application in real case studies in cancer research, as well as recent initiatives to exploit international Electronic Health Records to face COVID-19, including:

  • preprocessing and mining of microarray data for pharmacogenomics applications,
  • biological networks alignment, community detection, and applications in brain connectome,
  • integrative bioinformatics, integration and enrichment of biological pathways,
  • analysis of international Electronic Health Records to face the COVID-19 pandemic: the Consortium for Clinical Characterization of COVID-19 by EHR (4CE).

About the author

Mario Cannataro is a Full Professor of computer engineering at the University “Magna Græcia” of Catanzaro, Italy, and the Director of the Data Analytics Research Center. His current research interests include parallel computing, bioinformatics, health informatics, artificial intelligence. He published three books and more than 300 papers in international journals and conference proceedings. Mario Cannataro is a Senior Member of ACM, ACM SIGBio, IEEE, BITS (Bioinformatics Italian Society) and SIBIM (Italian Society of Biomedical Informatics).