TY - GEN
T1 - Deep jam
T2 - 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
AU - Carribault, Patrick
AU - Cohen, Albert
AU - Jalby, William
PY - 2005/12/1
Y1 - 2005/12/1
N2 - A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, when the memory hierarchy (locality and bandwidth) is not the dominant bottleneck. We consider two real applications from computational biology and from cryptanalysis, characterized by long sequences of dependent instructions, irregular control-flow and intricate scalar and array dependence patterns. Although these applications exhibit excellent memory locality and branch-prediction behavior, state-of-the-art loop transformations and back-end optimizations are unable to exploit much instruction-level parallelism. We show that good speedups can be achieved through deep jam, a new transformation of the program control- and data-flow. Deep jam combines scalar and array renaming with a generalized form of recursive unroll-and-jam; it brings together independent instructions across irregular control structures, removing memory-based dependences. This optimization contributes to the extraction of fine-grain parallelism in irregular applications. We propose a feedback-directed deep jam algorithm, selecting a jamming strategy, function of the architecture and application charactristics.
AB - A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, when the memory hierarchy (locality and bandwidth) is not the dominant bottleneck. We consider two real applications from computational biology and from cryptanalysis, characterized by long sequences of dependent instructions, irregular control-flow and intricate scalar and array dependence patterns. Although these applications exhibit excellent memory locality and branch-prediction behavior, state-of-the-art loop transformations and back-end optimizations are unable to exploit much instruction-level parallelism. We show that good speedups can be achieved through deep jam, a new transformation of the program control- and data-flow. Deep jam combines scalar and array renaming with a generalized form of recursive unroll-and-jam; it brings together independent instructions across irregular control structures, removing memory-based dependences. This optimization contributes to the extraction of fine-grain parallelism in irregular applications. We propose a feedback-directed deep jam algorithm, selecting a jamming strategy, function of the architecture and application charactristics.
U2 - 10.1109/PACT.2005.16
DO - 10.1109/PACT.2005.16
M3 - Conference contribution
AN - SCOPUS:33746733539
SN - 076952429X
SN - 9780769524290
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 291
EP - 300
BT - 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
Y2 - 17 September 2005 through 21 September 2005
ER -