PALLAS: HPC trace analysis at scale

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Identifying performance bottlenecks in a parallel application is tedious, especially because it requires analyzing the behaviour of various software components, as bottlenecks may have several causes and symptoms. Tracing tools are used to collect information describing the behaviour of the application. At the end of the execution, a trace file in a specific format is available to the application user, which can be used to conduct a complete post-mortem investigation. However, these tools may alter the performance of the application, and can create thousands of heavy trace files, especially at a large scale. Most importantly, the post-mortem analysis needs to load and process these files. This quickly becomes impractical for large scale applications, as memory gets exhausted and the number of opened files exceeds the system capacity.We propose PALLAS, a generic trace format tailored for conducting various post-mortem performance analysis of traces describing large executions of HPC applications. During the execution of the application, PALLAS collects events and detects their repetitions on-The-fly. When storing the trace to disk, PALLAS groups the data from similar events or groups of events together in order to later speed up trace reading. We demonstrate that the PALLAS online detection of the program structure does not significantly degrade the performance of the applications. Moreover, the PALLAS format allows faster trace analysis compared to other evaluated trace formats. Overall, the PALLAS trace format allows an interactive analysis of a trace that is required when a user investigates a performance problem.

Original languageEnglish
Title of host publicationProceedings of the 2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331512569
DOIs
Publication statusPublished - 1 Jan 2025
Event2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025 - Edinburgh, United Kingdom
Duration: 3 Sept 20255 Sept 2025

Publication series

NameProceedings of the 2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025

Conference

Conference2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025
Country/TerritoryUnited Kingdom
CityEdinburgh
Period3/09/255/09/25

Keywords

  • performance analysis
  • trace format

Fingerprint

Dive into the research topics of 'PALLAS: HPC trace analysis at scale'. Together they form a unique fingerprint.

Cite this