TY - GEN
T1 - PALLAS
T2 - 2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025
AU - Guelque, Catherine
AU - Honore, Valentin
AU - Swartvagher, Philippe
AU - Trahay, Francois
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Identifying performance bottlenecks in a parallel application is tedious, especially because it requires analyzing the behaviour of various software components, as bottlenecks may have several causes and symptoms. Tracing tools are used to collect information describing the behaviour of the application. At the end of the execution, a trace file in a specific format is available to the application user, which can be used to conduct a complete post-mortem investigation. However, these tools may alter the performance of the application, and can create thousands of heavy trace files, especially at a large scale. Most importantly, the post-mortem analysis needs to load and process these files. This quickly becomes impractical for large scale applications, as memory gets exhausted and the number of opened files exceeds the system capacity.We propose PALLAS, a generic trace format tailored for conducting various post-mortem performance analysis of traces describing large executions of HPC applications. During the execution of the application, PALLAS collects events and detects their repetitions on-The-fly. When storing the trace to disk, PALLAS groups the data from similar events or groups of events together in order to later speed up trace reading. We demonstrate that the PALLAS online detection of the program structure does not significantly degrade the performance of the applications. Moreover, the PALLAS format allows faster trace analysis compared to other evaluated trace formats. Overall, the PALLAS trace format allows an interactive analysis of a trace that is required when a user investigates a performance problem.
AB - Identifying performance bottlenecks in a parallel application is tedious, especially because it requires analyzing the behaviour of various software components, as bottlenecks may have several causes and symptoms. Tracing tools are used to collect information describing the behaviour of the application. At the end of the execution, a trace file in a specific format is available to the application user, which can be used to conduct a complete post-mortem investigation. However, these tools may alter the performance of the application, and can create thousands of heavy trace files, especially at a large scale. Most importantly, the post-mortem analysis needs to load and process these files. This quickly becomes impractical for large scale applications, as memory gets exhausted and the number of opened files exceeds the system capacity.We propose PALLAS, a generic trace format tailored for conducting various post-mortem performance analysis of traces describing large executions of HPC applications. During the execution of the application, PALLAS collects events and detects their repetitions on-The-fly. When storing the trace to disk, PALLAS groups the data from similar events or groups of events together in order to later speed up trace reading. We demonstrate that the PALLAS online detection of the program structure does not significantly degrade the performance of the applications. Moreover, the PALLAS format allows faster trace analysis compared to other evaluated trace formats. Overall, the PALLAS trace format allows an interactive analysis of a trace that is required when a user investigates a performance problem.
KW - performance analysis
KW - trace format
UR - https://www.scopus.com/pages/publications/105018049827
U2 - 10.1109/CLUSTERWorkshops65972.2025.11164202
DO - 10.1109/CLUSTERWorkshops65972.2025.11164202
M3 - Conference contribution
AN - SCOPUS:105018049827
T3 - Proceedings of the 2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025
BT - Proceedings of the 2025 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 September 2025 through 5 September 2025
ER -