Passer à la navigation principale Passer à la recherche Passer au contenu principal

PALLAS: A Generic Trace Format for Large HPC Trace Analysis

  • CNRS UMR 5157 SAMOVAR
  • INRIA Institut National de Recherche en Informatique et en Automatique

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Identifying performance bottlenecks in a parallel application is tedious, especially because it requires analyzing the behaviour of various software components, as bottlenecks may have several causes and symptoms. For example, a load imbalance may cause long MPI waiting times, or contention on disk may degrade the performance of I/O operations. Detecting a performance problem means investigating the execution of an application and applying several performance analysis techniques. To do so, one can use a tracing tool to collect information describing the behaviour of the application. At the end of the execution, a trace file in a specific format is available to the application user, which can be used to conduct a complete post-mortem investigation. Several challenges emerge from the generation and use of traces. Tracing applications may alter the performance of the application, and can create thousands of heavy trace files, especially at a large scale. Most importantly, the post-mortem analysis needs to load these thousands of trace files in memory, and process them. This quickly becomes impractical for large scale applications, as memory gets exhausted and the number of opened files exceeds the system capacity. In this paper, we propose PALLAS, a generic trace format tailored for conducting various post-mortem performance analysis of traces describing large executions of HPC applications. During the execution of the application, PALLAS collects events and detects their repetitions on-the-fly. When storing the trace to disk, PALLAS groups the data from similar events or groups of events together in order to later speed up trace reading. We demonstrate that the PALLAS online detection of the program structure does not significantly degrade the performance of the applications. Moreover, the PALLAS format allows faster trace analysis compared to other evaluated trace formats. Overall, the PALLAS trace format allows an interactive analysis of a trace that is required when a user investigates a performance problem.

langue originaleAnglais
titreProceedings - 2025 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages273-284
Nombre de pages12
Edition2025
ISBN (Electronique)9798331532376
Les DOIs
étatPublié - 1 janv. 2025
Evénement39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025 - Milan, Italie
Durée: 3 juin 20257 juin 2025

Une conférence

Une conférence39th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2025
Pays/TerritoireItalie
La villeMilan
période3/06/257/06/25

Empreinte digitale

Examiner les sujets de recherche de « PALLAS: A Generic Trace Format for Large HPC Trace Analysis ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation