TY - GEN
T1 - Leveraging Interaction between Memory Footprint and Parallelism Degree for efficient GPU Portings
AU - Boichot, Mickaël
AU - Roussel, Adrien
AU - Brunet, Elisabeth
AU - Carribault, Patrick
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Porting large HPC applications entirely on a Graphics Processing Unit (GPU) can be challenging. Some code portions are indeed unsuitable for GPU porting. Therefore, selecting the most profitable code parts for GPU porting is crucial. Many profiling tools address this issue, but their overhead is non-negligible for large HPC test cases. Moreover, the extracted code parts might not be the best candidates for any input set size. We present an approach that extrapolates the behavior of pre-selected code parts from different input set size runs on a target GPU. This enables developers to evaluate the application's parallelism potential and memory footprint prior to GPU porting. We applied our approach to several HPC mini-applications and evaluated the extrapolations through a comparison to the existing GPU versions, as ground truth, on different vendors' GPUs. Our results provide input set sizes of magnitude leading to GPU memory saturation and recommend which pre-selected code parts should be further studied for GPU porting.
AB - Porting large HPC applications entirely on a Graphics Processing Unit (GPU) can be challenging. Some code portions are indeed unsuitable for GPU porting. Therefore, selecting the most profitable code parts for GPU porting is crucial. Many profiling tools address this issue, but their overhead is non-negligible for large HPC test cases. Moreover, the extracted code parts might not be the best candidates for any input set size. We present an approach that extrapolates the behavior of pre-selected code parts from different input set size runs on a target GPU. This enables developers to evaluate the application's parallelism potential and memory footprint prior to GPU porting. We applied our approach to several HPC mini-applications and evaluated the extrapolations through a comparison to the existing GPU versions, as ground truth, on different vendors' GPUs. Our results provide input set sizes of magnitude leading to GPU memory saturation and recommend which pre-selected code parts should be further studied for GPU porting.
KW - Application Porting
KW - GPU
KW - HPC
KW - Heterogeneous Architectures
UR - https://www.scopus.com/pages/publications/105015531234
U2 - 10.1109/IPDPSW66978.2025.00136
DO - 10.1109/IPDPSW66978.2025.00136
M3 - Conference contribution
AN - SCOPUS:105015531234
T3 - Proceedings - 2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025
SP - 857
EP - 865
BT - Proceedings - 2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025
Y2 - 3 June 2025 through 7 June 2025
ER -