In the Exascale era the gap between the computation time and I/O time is increasing. Data-intesive problems are now common in the HPC world. Com- plex applications are envisioned as data-driven workflows, i.e., direct acyclic graphs (DAG) were nodes represent independent parallel applications (e.g., MPI, Apache Spark, PyTorch, etc.), and edges represent parallel inter-application communications. For example, a scientific simulation generates files that de- scribe the evolution of a simulation and another application reads these files to create a 3D visualization of the simulation. This solution is inefficient beacuse the files are used to communicate. In this work we will present a new framework (CAPIO) that allows independently designed applications communicating via files to communicate using shared memory and messages. It is also possible to choose the communication pattern and the data tranformation. CAPIO is designed for increase the performance of existing workflows by modifying the original applications as little as possible, allowing the programmer to focus only on the application logic and not on the performance issues with the I/O (e.g., synchronization and exchange of messages). This work opens up the possibility for future developments. CAPIO can be expanded to be able to reason about communicating distributed abstract data types rather than a collection of files, in order to make it effective also for the creation of new solutions.
CAPIO: Cross-Application Programmable I/O
MARTINELLI, ALBERTO RICCARDO
2019/2020
Abstract
In the Exascale era the gap between the computation time and I/O time is increasing. Data-intesive problems are now common in the HPC world. Com- plex applications are envisioned as data-driven workflows, i.e., direct acyclic graphs (DAG) were nodes represent independent parallel applications (e.g., MPI, Apache Spark, PyTorch, etc.), and edges represent parallel inter-application communications. For example, a scientific simulation generates files that de- scribe the evolution of a simulation and another application reads these files to create a 3D visualization of the simulation. This solution is inefficient beacuse the files are used to communicate. In this work we will present a new framework (CAPIO) that allows independently designed applications communicating via files to communicate using shared memory and messages. It is also possible to choose the communication pattern and the data tranformation. CAPIO is designed for increase the performance of existing workflows by modifying the original applications as little as possible, allowing the programmer to focus only on the application logic and not on the performance issues with the I/O (e.g., synchronization and exchange of messages). This work opens up the possibility for future developments. CAPIO can be expanded to be able to reason about communicating distributed abstract data types rather than a collection of files, in order to make it effective also for the creation of new solutions.File | Dimensione | Formato | |
---|---|---|---|
803731_thesismartinelli.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
878.37 kB
Formato
Adobe PDF
|
878.37 kB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/30265