TY - GEN
T1 - An improvement of OpenMP pipeline parallelism with the BatchQueue algorithm
AU - Preud'homme, Thomas
AU - Sopena, Julien
AU - Thomas, Gaël
AU - Folliot, Bertil
PY - 2012/12/1
Y1 - 2012/12/1
N2 - In the context of multicore programming, pipeline parallelism is a solution to easily transform a sequential program into a parallel one without requiring a whole rewriting of the code. The OpenMP stream-computing extension presented by Pop and Cohen proposes an extension of OpenMP to handle pipeline parallelism. However, their communication algorithm relies on Multiple-producer-Multiple- Consumer queues, while pipelined applications mostly deal with linear chains of communication, i.e., with only a single producer and a single consumer. To improve the performance of the OpenMP streamextension, we propose to add a more specialized Single- Producer-Single-Consumer communication algorithm called BatchQueue and to select it for one-to-one communication. Our evaluation shows that BatchQueue is then able to improve the throughput up to a factor 2 on an 8-core machine both for example application and real applications. Our study shows therefore that using specialized and efficient communication algorithms can have a significant impact on the overall performance of pipelined applications.
AB - In the context of multicore programming, pipeline parallelism is a solution to easily transform a sequential program into a parallel one without requiring a whole rewriting of the code. The OpenMP stream-computing extension presented by Pop and Cohen proposes an extension of OpenMP to handle pipeline parallelism. However, their communication algorithm relies on Multiple-producer-Multiple- Consumer queues, while pipelined applications mostly deal with linear chains of communication, i.e., with only a single producer and a single consumer. To improve the performance of the OpenMP streamextension, we propose to add a more specialized Single- Producer-Single-Consumer communication algorithm called BatchQueue and to select it for one-to-one communication. Our evaluation shows that BatchQueue is then able to improve the throughput up to a factor 2 on an 8-core machine both for example application and real applications. Our study shows therefore that using specialized and efficient communication algorithms can have a significant impact on the overall performance of pipelined applications.
U2 - 10.1109/ICPADS.2012.55
DO - 10.1109/ICPADS.2012.55
M3 - Conference contribution
AN - SCOPUS:84874081132
SN - 9780769549033
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 348
EP - 355
BT - Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, ICPADS 2012
T2 - 18th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2012
Y2 - 17 December 2012 through 19 December 2012
ER -