NEARDATA: a new ambitious project for Sano    

NEARDATA: a new ambitious project for Sano    

Sano is now a part of an international consortium working on a project “NEARDATA: Extreme Near-Data Processing Platform” supported by the European Union Horizon program. The project’s kick-off meeting happened on 10th of February 2023 at University of Rovira i Virgili (URV) in Tarragona, URV is the coordinator of the project. 

The goal of NEARDATA is to create an extreme data platform which will mediate data flows between Object Storage technology and Data Analytics platforms across the Compute Continuum. The platform will serve as an intermediary data service that intercepts and optimises data flows (S3 API, stream APIs) with high performance near-data connectors (Cloud/Edge). 

The need for a such mediating platform is caused by the fact that up to 90% of medical data is stored in unsearchable formats, as a consequence, huge pools of unstructured data are nowadays locked away in Object stores as bulk data and are very difficult to mine and analyse. Additionally, medical data is highly dispersed and has tough privacy and security requirements. Because of that, there is a strong challenge in ingestion data from Object Storage to computing analytics services. 

The Extreme near-data platform will enable consumption, mining and processing of distributed and federated data without need to master the logistics of data access across heterogeneous data locations and pools. 

The project is focusing on three health data domains, which contain large unstructured data: metabolomics (images), genomics (text), and surgery data (video). The Extreme near-data platform will leverage effects of advanced AI technologies in those domains. 

Sano is responsible for two subprojects in the genomics domain:      

  • Transcriptomics Use Case. The aim is to develop a pipeline for building transcriptomics atlas of selected tissues/diseases, with the use of HPC and Cloud technologies; 
  • Federated Learning framework. The task is to develop a set of tools for running Federated Learning experiments on large scale genomics data. 

NEARDATA is aiming at producing long-lasting scientific, economic, and societal impacts in the Health sector in Europe thanks to a unique next generation data infrastructure. The project will boost European leadership in two Health domains (OMICs, Surgery) by creating world-wide reference data spaces using project’s technologies. 

More information about the project https://neardata.eu/