TY - GEN
T1 - To Share or Not to Share
T2 - 31st European MPI Users’ Group Meeting, EuroMPI 2024
AU - Adam, Julien
AU - Besnard, Jean Baptiste
AU - Roussel, Adrien
AU - Jaeger, Julien
AU - Carribault, Patrick
AU - Pérache, Marc
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.
AB - The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.
KW - MPI
KW - Memory
KW - NUMA
KW - Programming Models
KW - Thread
U2 - 10.1007/978-3-031-73370-3_6
DO - 10.1007/978-3-031-73370-3_6
M3 - Conference contribution
AN - SCOPUS:85206108784
SN - 9783031733697
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 89
EP - 102
BT - Recent Advances in the Message Passing Interface - 31st European MPI Users’ Group Meeting, EuroMPI 2024, Proceedings
A2 - Blaas-Schenner, Claudia
A2 - Niethammer, Christoph
A2 - Haas, Tobias
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 25 September 2024 through 27 September 2024
ER -