FARM: Comprehensive Data Center Network Monitoring and Management

Jérôme Graf, Pavel Chuprikov, Patrick Eugster, Patrick Jahnke

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Modern data centers face growing workloads, putting accrued pressure on network monitoring solutions necessary for ensuring correct and efficient operation. Advances in network programmability have meanwhile led to yet more monitoring data being straightforwardly collected from switches, exacerbating bottlenecks in corresponding collection-centric approaches. This limits scalability and responsiveness, especially when several monitoring tasks are deployed side-by-side, as is common for network management. We present a novel and comprehensive selection-centric solution for network monitoring and management (M&M) called FARM that significantly simplifies the development and deployment of network M&M tasks while being effective and scalable. FARM's main novelty lies in its comprehensive design. Instead of focusing solely on individual parts of network monitoring, FARM takes a global perspective on the problem and aligns all of its components correspondingly: a strongly decentralized software architecture, a specifically designed programming model, and an integrated performance optimization framework. In short, FARM performs monitoring (re)actions locally on switches to the extent possible, using centralized components only if and when needed, and globally optimizes placement, considering placement constraints intrinsically expressed through its programming model as well as commonalities among tasks. Deployed in a production data center, FARM shows significant gains in responsiveness (up to 3427× faster over recent generic approaches and 4 × faster over highly specialized solutions), and savings in network band-width (10000 ×) and computational effort. Placement optimization shows excellent scalability up to 10200 seeds across 1040 switches.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 44th International Conference on Distributed Computing Systems, ICDCS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages520-530
Number of pages11
ISBN (Electronic)9798350386059
DOIs
Publication statusPublished - 1 Jan 2024
Externally publishedYes
Event44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024 - Jersey City, United States
Duration: 23 Jul 202426 Jul 2024

Publication series

NameProceedings - International Conference on Distributed Computing Systems
ISSN (Print)1063-6927
ISSN (Electronic)2575-8411

Conference

Conference44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024
Country/TerritoryUnited States
CityJersey City
Period23/07/2426/07/24

Keywords

  • Data Center Networking
  • Domain-specific Language
  • Network Monitoring and Management

Fingerprint

Dive into the research topics of 'FARM: Comprehensive Data Center Network Monitoring and Management'. Together they form a unique fingerprint.

Cite this