TY - GEN
T1 - An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX
AU - Pereira, Romain
AU - Roussel, Adrien
AU - Tsuji, Miwako
AU - Carribault, Patrick
AU - Sato, Mitsuhisa
AU - Murai, Hitoshi
AU - Gautier, Thierry
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/1/11
Y1 - 2024/1/11
N2 - The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting, . . . MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.
AB - The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting, . . . MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.
KW - Dependency
KW - Graph
KW - HPC
KW - MPI
KW - OpenMP
KW - Task
U2 - 10.1145/3636480.3637094
DO - 10.1145/3636480.3637094
M3 - Conference contribution
AN - SCOPUS:85182952070
T3 - ACM International Conference Proceeding Series
SP - 7
EP - 16
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region Workshops, HPC Asia 2024 Workshops
PB - Association for Computing Machinery
T2 - 2024 International Conference on High Performance Computing in Asia-Pacific Region Workshops, HPC Asia 2024 Workshops
Y2 - 25 January 2024
ER -