TY - GEN
T1 - NumaGiC
T2 - 20th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2015
AU - Gidra, Lokesh
AU - Thomas, Gaël
AU - Sopena, Julien
AU - Shapiro, Marc
AU - Nguyen, Nhan
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/3/14
Y1 - 2015/3/14
N2 - On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, Numa-GiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6× over NAPS (up to 5.4× over Parallel Scavenge).
AB - On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, Numa-GiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6× over NAPS (up to 5.4× over Parallel Scavenge).
KW - Garbage collection
KW - Multicore
KW - NUMA
UR - https://www.scopus.com/pages/publications/84939189058
U2 - 10.1145/2694344.2694361
DO - 10.1145/2694344.2694361
M3 - Conference contribution
AN - SCOPUS:84939189058
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 661
EP - 673
BT - ASPLOS 2015 - 20th International Conference on Architectural Support for Programming Languages and Operating Systems
PB - Association for Computing Machinery
Y2 - 14 March 2015 through 18 March 2015
ER -