Adaptive OpenMP for large NUMA nodes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The advent of multicore processors advocates for a hybrid programming model like MPI+OpenMP. Therefore, OpenMP runtimes require solid performance from a small number of threads (one MPI task per socket, OpenMP inside each socket) to a large number of threads (one MPI task per node, OpenMP inside each node). To tackle this issue, we propose a mechanism to improve performance of thread synchronization with a large spectrum of threads. It relies on a hierarchical tree traversed in a different manner according to the number of threads inside the parallel region. Our approach exposes high performance for thread activation (parallel construct) and thread synchronization (barrier construct). Several papers study hierarchical structures to launch and synchronize OpenMP threads [1, 2]. They tested tree-based approaches to distribute and synchronize threads, but they do not explore mixed hierarchical solutions.

Original languageEnglish
Title of host publicationOpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, IWOMP 2012, Proceedings
Pages254-257
Number of pages4
DOIs
Publication statusPublished - 18 Jun 2012
Externally publishedYes
Event8th International Workshop on OpenMP, IWOMP 2012 - Rome, Italy
Duration: 11 Jun 201213 Jun 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7312 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Workshop on OpenMP, IWOMP 2012
Country/TerritoryItaly
CityRome
Period11/06/1213/06/12

Fingerprint

Dive into the research topics of 'Adaptive OpenMP for large NUMA nodes'. Together they form a unique fingerprint.

Cite this