You are here:

Extreme-scale Data and Computing

…tackles the new fundamental challenges for computer science and engineering research, posed by the unique combination of Modelling and Simulation with Health Data Science and Health Informatics. The computational and data processing needs of Sano research will push the boundaries of current state-of-the-art infrastructures for AI, HPC, big data and cloud computing. This includes alignment of traditional HPC systems with big data analytics and ML/AI workloads, using CPU, GPU, many-core, hybrid, virtualized and containerized environments with the computational needs of systems required to deliver patient-specific care at timescales appropriate for clinical use. Moreover, health data science will benefit from novel approaches in distributed computing and security research, such as Federated Learning, Blockchain, Differential Privacy or Encrypted Computation, which can be applied to medical data in a secure and privacy-conscious manner.

New developments in computer hardware, programming models, cloud computing and emerging services will influence development, deployment and execution of computational and AI models at extreme scale, requiring constant evaluation of new technologies and platforms, experimenting with novel approaches, and prototyping new solutions. Development of systems operating in clinical, research, HPC, Big Data/AI environments will require novel, transparent techniques, and the delivery of state-of-the-art in health data science and in-silico techniques will require exascale computing resources. Sano will become a driver for developments within EU-level HPC and cloud initiatives, including PRACE and EuroHPC.

Read more

Maciej Malawski

PhD Team Leader Extreme Computing

Over 20 years of experience in research in parallel and distributed computing, high performance computing (HPC), grid and cloud technologies, serverless and container-based infrastructures. Interested in innovative applications of these technologies to scientific applications, with a special focus on biomedical research. 

Big data analytics, with experience in large-scale processing of scientific data in cloud infrastructures, contributed to use of Apache Spark and serverless processing of data in high energy physics in collaboration with CERN. 

Scientific workflows with focus on usage of novel and emerging large-scale computing infrastructures, performance evaluation, resource management, scheduling and cost optimization. 

Co-author of over 50 international publications including journal and conference papers, and book chapters. Member of technical program committees of premier conferences on scientific, parallel and distributed computing (SC, ICCS, IPDPS, CCGrid, UCC). Leadership positions in major conferences in the field: general co-chair of Euro-Par 2020 and member of Steering Committee, Area Co-chair IEEE Cluster 2021, BoF Vice-Chair at SC18. Member of editorial board of Future Generation Computer Systems Journal. 

Prizes and distinctions:

  • 2020 Paper: Performance evaluation of heterogeneous cloud functions published in Concurrency and Computation: Practice and Experience, was among the top most downloaded papers in 2018-2019 
  • 2018 AGH Rector’s award for organizational work 
  • 2018 Publons Peer Review Award, for placing in the top 1% of reviewers in Computer Science 
  • 2011 Executable Paper Grand Challenge - 1st prize 
  • 2019 - ongoing Associate Professor, Institute of Computer Science AGH, University of Science and Technology, Kraków, Poland 

    Senior Researcher, Sano Centre for Computational Medicine in Krakow, Poland 

     

  • 2001 - ongoing Researcher,  employed in research projects at the Academic Computer Centre CYFRONET AGH 

     

  • 2009 - 2019  Assistant Professor, Department of Computer Science AGH 

     

  • 2015 Adjunct Research Assistant Professor, University of Notre Dame, Center for Research Computing, Notre Dame, USA 

     

  • 2013 Adjunct Research Assistant Professor, University of Notre Dame, Department of Computer Science and Engineering, Notre Dame, USA 

     

  • 2011-2012 Postdoctoral Research Associate, University of Notre Dame, Center for Research Computing, Notre Dame, USA 

     

  • 2001 - 2009  Teaching and Research Assistant, Department of Computer Science AGH 

  • 2009 Ph.D., Computer Science, AGH University of Science and Technology, Kraków, Poland  
  • 2004 MSc, Physics, Jagiellonian University, Kraków, Poland 
  • 2001 MSc, Computer Science, AGH University of Science and Technology, Kraków, Poland 

Sano Centre for Computational Medicine 

Czarnowiejska 36 building C5, 30-054, Cracow, Poland 

Email:  

Team Members

Filip Ślazyk

Junior Scientific Programmer

Holds BSc in Computer Science obtained at AGH UST in 2021, currently doing MSc degree in Data Science at the same university. His bachelor thesis was related to applying machine learning techniques in the medical context. Interested in machine learning, deep learning, cloud computing. Currently doing research on the applications of federated learning in the healthcare as his master's thesis project. Has experience gained at the key players in the industry. Loves to spend his free time running, hiking, or playing squash.

Przemysław Jabłecki

MSc Student

Przemysław holds BSc in Computer Science. He graduated with honours from AGH UST in 2021. His bachelor thesis was focused on the integration and comparison of feature selection algorithms in the context of data of patients suffering from Hairy Cell Leukemia. Currently, he is pursuing MSc degree in Data Science and doing research on the application of federated learning to medical image segmentation. Przemysław is mostly interested in deep learning, algorithms and large-scale cloud computing. In his free time, he loves travelling, playing chess and reading crime fiction.

Karol Zając

Junior Scientific Programmer

During his engineering studies in the field of Computer Science at the Faculty of Computer Science, Electronics and Telecommunications at AGH University of Science and Technology in Cracow. Currently participating in collaborative, European project - In Silico World (ISW). Especially interested in Modeling, Simulation and also Artificial Intelligence. Usually spends his leisure time on swimming, traveling with his mates and being into DIY for mental and manual skills development.

Jan Przybyszewski

PhD Student

Jan has always been interested in healthcare, having a short tenure in Jagiellonian University Medical College. In the end, he decided to pursue a career in engineering - he obtained his M.Sc degree in Computer Science at AGH UST in Cracow. His thesis was focused on using graph neural networks in miRNA-mRNA target prediction. Apart from academic experience, he also developed his software engineering skills by working on commercial projects at companies such as Nomagic and Mercedes-Benz AG. At Sano, he joins the Extreme-scale Data and Computing team to conduct exciting research on the use of Federated Learning in healthcare. In his free time, Jan enjoys reading, playing videogames, and doing sports.

Krzysztof Gądek

Junior Scientific Programmer

Currently studying computer science at the AGH UST (WIEiT). Programming since he was 12 years old, it's one his biggest passion. In Sano started as intern on the HPHOB project, now working with MEE development and occasionally other programming tasks. His other passions are science (physics, mathematics (especially differential equations), astronomy), history and sports, like gym, swimming or skiing. Great fan of the Balkans culture, visited most of countries in this region, except for Albania. Foreign languages lover, currently studying English and Russian, in future plans to learn German, Italian and Serbo-Croatian

Piotr Kica

Junior Scientific Programmer

Currently doing BSc in Computer Science at AGH UST in Cracow. Particularly interested in big data problems and cloud computing solutions. His Sano career started with an internship on data analysis related projects, now working as a part of the In Silico World project. A follower of an active lifestyle – both physical (working out, playing team sports) and mental (playing chess). In his free time he likes to study Norwegian, listen to audiobooks and go to orchestral concerts.

Bartosz Balis

Senior Postdoc

Bartosz Balis is an associate professor at the Institute of Computer Science of the AGH University of Science and Technology, and a Senior Postgraduate Researcher at the Sano Centre for Computational Medicine. He is also a member of the CERN ALICE experiment. A graduate of AGH and Jagiellonian University, he obtained his PhD and DSc (habilitation) in Computer Science from the AGH University. A co-author of over 60 international peer-reviewed scientific publications, including papers in high-ranked journals. His research interests include scientific workflows, data science, e-Science, cloud computing, and distributed computing. Dr Balis has been a member of conference program and organizing committees, including Euro-Par 2020 workshops (General Co-chair), HPCS 2018-19 (Tutorials Co-Chair), IEEE/ACM SC18 Birds of a Feather Planning Committee, IEEE/ACM SC16 Workshops Planning Committee. He has participated in national and EU-FP5/FP6/FP7/H2020 research projects CrossGrid, CoreGRID, K-Wf Grid, ViroLab, Gredia, UrbanFlood, PaaSage and WATERLINE.

Current Projects

The In Silico World project aims at accelerating the uptake of modelling and simulation technologies used for the development and regulatory assessment of medicines and medical devices, by lowering seven identified barriers: development, validation, accreditation, optimisation, exploitation, information, and training.

Computer models informed by experimental data enable us to test hypotheses and make predictions, significantly streamlining the research and development cycle relative to trial and error. When it comes to medicine, experimentation relies on biological samples ranging from cultured cells to whole animals, so increased reliance on modelling has additional benefits. Harnessing Big Data and tremendous advances in computing power could pave the way to minimising and eventually eliminating the need for anything other than in silico 'experimentation' in medical research and development.

The consortium will use an advanced simulation environment developed by Sano to address the needs for scalability and efficiency of the solutions developed in the project. Such environment provides access computation and storage resources in local and the main European e-infrastructures and commercial cloud services. Moreover, Sano works on performance, scalability and cost efficiency of the advanced simulation models running at extreme scale.

More about the project: https://insilico.world/

  • MSc Project 
  • Although federated learning is a promising technique for analysis of medical images, as it may solve some security and privacy issues related to distributed data access, there is still a need to evaluate this technique in large-scale experiments in a distributed environment such as cloud infrastructure. 

The goal of the thesis will be to run large-scale experiments with federated learning on medical image classification tasks. We plan to use public datasets such as chest X-ray images, coming from multiple sources (countries, hospitals), and existing distributed machine learning frameworks. As the computing infrastructure, a public cloud and PL-Grid infrastructure will be used. Various metrics related to the distribution of data, its granularity and partitioning will be investigated, to understand their impacts on the both the efficiency of the learning process and the performance of the infrastructure. It will be also possible to extend the study to assess the impact of possible attacks and their mitigation strategies. 

  • MSc Project 
  • Federated learning is a technique which allows training machine learning models in a distributed way without transferring the data from its source. It has thus potential applications in medical image analysis, where privacy and security issues are of great importance. Although there are examples of using federated approaches to analysis of medical images, there is still need for research in this area and for experiments in distributed environment. 

The goal of the thesis will be to apply federated learning techniques to the problem of medical image segmentation. We plan to use public datasets such as echocardiography, coming from multiple sources. The analysis will be performed using distributed computing frameworks such as Flower or FedML, using distributed computing infrastructures such as PL-Grid or a public cloud service. In addition to evaluation of the learning process, the goal will be also to evaluate the performance of distributed computing environment. Further study will include also possible attacks and security of the developed solution. Other types of data and machine learning tasks can be considered for comparison as well. 

Publications

Dajda, Jacek; Idzik, Michał; Sroka, Jakub; Pawłowski, Mikołaj Sikora Wiktor; lka, Maciej Smo; Jabłecki, Przemysław; Ślazyk, Filip; Malawski, Maciej; Majerz, Emilia; Pasternak, Aleksandra; Dzwinel, Witold

Current Trends in Software Engineering Bachelor Theses Journal Article

In: Computing and Informatics, 2021.

Abstract | BibTeX | Links:

P., Jabłecki; F., Ślazyk; M., Malawski

Federated Learning in the Cloud for Analysis of Medical Images - Experience with Open Source Frameworks Conference

2021.

Abstract | BibTeX | Links:

Bubak, M; Czechowicz, K; Gubała, T; Hose, D R; Kasztelnik, M; Malawski, M; Meizner, J; Nowakowski, P; Wood, S

The EurValve model execution environment Journal Article

In: Interface Focus, vol. 11, no. 1, pp. 20200006, 2021, ISSN: 2042-8898.

Abstract | BibTeX | Links:

Malawski, Maciej; Gajek, Adam; Zima, Adam; Balis, Bartosz; Figiela, Kamil

Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions Journal Article

In: Future Generation Computer Systems, vol. 110, pp. 502–514, 2020, ISSN: 0167739X.

Abstract | BibTeX | Links:

Malawski, Maciej; Rzadca, Krzysztof (Ed.)

Euro-Par 2020: Parallel Processing Book

Springer International Publishing, Cham, 2020, ISBN: 978-3-030-57674-5.

BibTeX | Links:

Tomasiewicz, Dawid; Pawlik, Maciej; Malawski, Maciej; Rycerz, Katarzyna

Foundations for Workflow Application Scheduling on D-Wave System Inproceedings

In: Krzhizhanovskaya, Valeria V; Závodszky, Gábor; Lees, Michael H; Dongarra, Jack J; Sloot, Peter M A; Brissos, Sérgio; Teixeira, Joao (Ed.): Computational Science -- ICCS 2020, pp. 516–530, Springer International Publishing, Cham, 2020, ISBN: 978-3-030-50433-5.

Abstract | BibTeX

Avati, Valentina; Blaszkiewicz, Milosz; Bocchi, Enrico; Canali, Luca; Castro, Diogo; Cervantes, Javier; Grzanka, Leszek; Guiraud, Enrico; Kaspar, Jan; Kothuri, Prasanth; Lamanna, Massimo; Malawski, Maciej; Mnich, Aleksandra; Moscicki, Jakub; Murali, Shravan; Piparo, Danilo; Tejedor, Enric

Declarative Big Data Analysis for High-Energy Physics: TOTEM Use Case Inproceedings

In: Yahyapour, Ramin (Ed.): Euro-Par 2019: Parallel Processing, pp. 241–255, Springer International Publishing, Cham, 2019, ISBN: 978-3-030-29400-7.

Abstract | BibTeX

Nowakowski, Piotr; Bubak, Marian; Bartyński, Tomasz; Gubała, Tomasz; Harceżlak, Daniel; Kasztelnik, Marek; Malawski, Maciej; Meizner, Jan

Cloud computing infrastructure for the VPH community Journal Article

In: Journal of Computational Science, vol. 24, pp. 169–179, 2018, ISSN: 18777503.

Abstract | BibTeX | Links:

Malawski, Maciej; Juve, Gideon; Deelman, Ewa; Nabrzyski, Jarek

Algorithms for cost-and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds Journal Article

In: Future Generation Computer Systems, vol. 48, pp. 1–18, 2015, ISSN: 0167739X.

Abstract | BibTeX | Links:

Malawski, Maciej; Figiela, Kamil; Nabrzyski, Jarek

Cost minimization for computational applications on hybrid cloud infrastructures Journal Article

In: Future Generation Comp. Syst., vol. 29, no. 7, pp. 1786–1794, 2013.

BibTeX | Links:

Malawski, Maciej; Bartyński, Tomasz; Bubak, Marian

Invocation of operations from script-based Grid applications Journal Article

In: Future Gener. Comput. Syst., vol. 26, no. 1, pp. 138–146, 2010, ISSN: 0167-739X.

BibTeX | Links:

F, Ślazyk; P, Jabłecki; M, Malawski; P., Płotka

CXR-FL: Deep Learning-based Chest X-ray Image Analysis Using Federated Learning Conference

22nd International Conference on Computational Science 0000.

BibTeX

Open Positions