TY - GEN
T1 - Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support
AU - Ferat, Manuel
AU - Pereira, Romain
AU - Roussel, Adrien
AU - Carribault, Patrick
AU - Steffenel, Luiz Angelo
AU - Gautier, Thierry
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.
AB - Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.
KW - Distributed Application
KW - GPU Computing
KW - OpenMP
KW - Task programming
UR - https://www.scopus.com/pages/publications/85140435320
U2 - 10.1007/978-3-031-15922-0_1
DO - 10.1007/978-3-031-15922-0_1
M3 - Conference contribution
AN - SCOPUS:85140435320
SN - 9783031159213
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 16
BT - OpenMP in a Modern World
A2 - Klemm, Michael
A2 - de Supinski, Bronis R.
A2 - Klinkenberg, Jannis
A2 - Neth, Brandon
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Workshop on OpenMP, IWOMP 2022
Y2 - 27 September 2022 through 30 September 2022
ER -