TY - GEN
T1 - FARM
T2 - 44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024
AU - Graf, Jérôme
AU - Chuprikov, Pavel
AU - Eugster, Patrick
AU - Jahnke, Patrick
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Modern data centers face growing workloads, putting accrued pressure on network monitoring solutions necessary for ensuring correct and efficient operation. Advances in network programmability have meanwhile led to yet more monitoring data being straightforwardly collected from switches, exacerbating bottlenecks in corresponding collection-centric approaches. This limits scalability and responsiveness, especially when several monitoring tasks are deployed side-by-side, as is common for network management. We present a novel and comprehensive selection-centric solution for network monitoring and management (M&M) called FARM that significantly simplifies the development and deployment of network M&M tasks while being effective and scalable. FARM's main novelty lies in its comprehensive design. Instead of focusing solely on individual parts of network monitoring, FARM takes a global perspective on the problem and aligns all of its components correspondingly: a strongly decentralized software architecture, a specifically designed programming model, and an integrated performance optimization framework. In short, FARM performs monitoring (re)actions locally on switches to the extent possible, using centralized components only if and when needed, and globally optimizes placement, considering placement constraints intrinsically expressed through its programming model as well as commonalities among tasks. Deployed in a production data center, FARM shows significant gains in responsiveness (up to 3427× faster over recent generic approaches and 4 × faster over highly specialized solutions), and savings in network band-width (10000 ×) and computational effort. Placement optimization shows excellent scalability up to 10200 seeds across 1040 switches.
AB - Modern data centers face growing workloads, putting accrued pressure on network monitoring solutions necessary for ensuring correct and efficient operation. Advances in network programmability have meanwhile led to yet more monitoring data being straightforwardly collected from switches, exacerbating bottlenecks in corresponding collection-centric approaches. This limits scalability and responsiveness, especially when several monitoring tasks are deployed side-by-side, as is common for network management. We present a novel and comprehensive selection-centric solution for network monitoring and management (M&M) called FARM that significantly simplifies the development and deployment of network M&M tasks while being effective and scalable. FARM's main novelty lies in its comprehensive design. Instead of focusing solely on individual parts of network monitoring, FARM takes a global perspective on the problem and aligns all of its components correspondingly: a strongly decentralized software architecture, a specifically designed programming model, and an integrated performance optimization framework. In short, FARM performs monitoring (re)actions locally on switches to the extent possible, using centralized components only if and when needed, and globally optimizes placement, considering placement constraints intrinsically expressed through its programming model as well as commonalities among tasks. Deployed in a production data center, FARM shows significant gains in responsiveness (up to 3427× faster over recent generic approaches and 4 × faster over highly specialized solutions), and savings in network band-width (10000 ×) and computational effort. Placement optimization shows excellent scalability up to 10200 seeds across 1040 switches.
KW - Data Center Networking
KW - Domain-specific Language
KW - Network Monitoring and Management
UR - https://www.scopus.com/pages/publications/85203143775
U2 - 10.1109/ICDCS60910.2024.00055
DO - 10.1109/ICDCS60910.2024.00055
M3 - Conference contribution
AN - SCOPUS:85203143775
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 520
EP - 530
BT - Proceedings - 2024 IEEE 44th International Conference on Distributed Computing Systems, ICDCS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 July 2024 through 26 July 2024
ER -