TY - GEN
T1 - Automatic OpenCL code generation for multi-device heterogeneous architectures
AU - Li, Pei
AU - Brunet, Elisabeth
AU - Trahay, François
AU - Parrot, Christian
AU - Thomas, Gaël
AU - Namyst, Raymond
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/8
Y1 - 2015/12/8
N2 - Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. Writing such code manually is time consuming and error-prone. In this paper, we propose a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for multiple accelerators. We evaluate both the performance and the usefulness of STEPOCL with three applications and show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
AB - Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. Writing such code manually is time consuming and error-prone. In this paper, we propose a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for multiple accelerators. We evaluate both the performance and the usefulness of STEPOCL with three applications and show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
KW - Accelerators
KW - Code generation
KW - Heterogeneous architectures
KW - OpenCL
UR - https://www.scopus.com/pages/publications/84976471065
U2 - 10.1109/ICPP.2015.105
DO - 10.1109/ICPP.2015.105
M3 - Conference contribution
AN - SCOPUS:84976471065
T3 - Proceedings of the International Conference on Parallel Processing
SP - 959
EP - 968
BT - Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th International Conference on Parallel Processing, ICPP 2015
Y2 - 1 September 2015 through 4 September 2015
ER -