Skip to main navigation Skip to search Skip to main content

PALLAS: A Generic Trace Format for Large HPC Trace Analysis

  • CNRS UMR 5157 SAMOVAR
  • INRIA Institut National de Recherche en Informatique et en Automatique

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Identifying performance bottlenecks in a parallel application is tedious, especially because it requires analyzing the behaviour of various software components, as bottlenecks may have several causes and symptoms. For example, a load imbalance may cause long MPI waiting times, or contention on disk may degrade the performance of I/O operations. Detecting a performance problem means investigating the execution of an application and applying several performance analysis techniques. To do so, one can use a tracing tool to collect information describing the behaviour of the application. At the end of the execution, a trace file in a specific format is available to the application user, which can be used to conduct a complete post-mortem investigation. Several challenges emerge from the generation and use of traces. Tracing applications may alter the performance of the application, and can create thousands of heavy trace files, especially at a large scale. Most importantly, the post-mortem analysis needs to load these thousands of trace files in memory, and process them. This quickly becomes impractical for large scale applications, as memory gets exhausted and the number of opened files exceeds the system capacity. In this paper, we propose PALLAS, a generic trace format tailored for conducting various post-mortem performance analysis of traces describing large executions of HPC applications. During the execution of the application, PALLAS collects events and detects their repetitions on-the-fly. When storing the trace to disk, PALLAS groups the data from similar events or groups of events together in order to later speed up trace reading. We demonstrate that the PALLAS online detection of the program structure does not significantly degrade the performance of the applications. Moreover, the PALLAS format allows faster trace analysis compared to other evaluated trace formats. Overall, the PALLAS trace format allows an interactive analysis of a trace that is required when a user investigates a performance problem.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages273-284
Number of pages12
Edition2025
ISBN (Electronic)9798331532376
DOIs
Publication statusPublished - 1 Jan 2025
Event39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025 - Milan, Italy
Duration: 3 Jun 20257 Jun 2025

Conference

Conference39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
Country/TerritoryItaly
CityMilan
Period3/06/257/06/25

Keywords

  • performance analysis
  • trace format

Fingerprint

Dive into the research topics of 'PALLAS: A Generic Trace Format for Large HPC Trace Analysis'. Together they form a unique fingerprint.

Cite this