TY - GEN
T1 - An MPI halo-cell implementation for zero-copy abstraction
AU - Besnard, Jean Baptiste
AU - Malony, Allen
AU - Shende, Sameer
AU - Pérache, Marc
AU - Carribault, Patrick
AU - Jaeger, Julien
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/9/21
Y1 - 2015/9/21
N2 - In the race for Exascale, the advent of many-core proces- sors will bring a shift in parallel computing architectures to systems of much higher concurrency, but with a relatively smaller memory per thread. This shift raises concerns for the adaptability of HPC software, for the current generation to the brave new world. In this paper, we study domain splitting on an increasing number of memory areas as an example problem where negative performance impact on computation could arise. We identify the specific parameters that drive scalability for this problem, and then model the halo-cell ratio on common mesh topologies to study the memory and communication implications. Such analysis argues for the use of shared-memory parallelism, such as with OpenMP, to address the performance problems that could occur. In contrast, we propose an original solution based entirely on MPI programming semantics, while providing the performance advantages of hybrid parallel programming. Our solution transparently replaces halo-cells transfers with pointer exchanges when MPI tasks are running on the same node, effectively removing memory copies. The results we present demonstrate gains in terms of memory and computation time on Xeon Phi (compared to OpenMP-only and MPI-only) using a representative domain decomposition benchmark.
AB - In the race for Exascale, the advent of many-core proces- sors will bring a shift in parallel computing architectures to systems of much higher concurrency, but with a relatively smaller memory per thread. This shift raises concerns for the adaptability of HPC software, for the current generation to the brave new world. In this paper, we study domain splitting on an increasing number of memory areas as an example problem where negative performance impact on computation could arise. We identify the specific parameters that drive scalability for this problem, and then model the halo-cell ratio on common mesh topologies to study the memory and communication implications. Such analysis argues for the use of shared-memory parallelism, such as with OpenMP, to address the performance problems that could occur. In contrast, we propose an original solution based entirely on MPI programming semantics, while providing the performance advantages of hybrid parallel programming. Our solution transparently replaces halo-cells transfers with pointer exchanges when MPI tasks are running on the same node, effectively removing memory copies. The results we present demonstrate gains in terms of memory and computation time on Xeon Phi (compared to OpenMP-only and MPI-only) using a representative domain decomposition benchmark.
KW - Ghost-cells
KW - MPI
KW - MPI halo
KW - Memory
KW - Zero-copy
U2 - 10.1145/2802658.2802669
DO - 10.1145/2802658.2802669
M3 - Conference contribution
AN - SCOPUS:84983405114
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 22nd European MPI Users' Group Meeting, EuroMPI 2015
PB - Association for Computing Machinery
T2 - 22nd European MPI Users' Group Meeting, EuroMPI 2015
Y2 - 21 September 2015 through 23 September 2015
ER -