Aquarius - Enable Fast, Scalable, Data-Driven Service Management in the Cloud

Zhiyuan Yao, Yoann Desmouceaux, Juan Antonio Cordero-Fuertes, Mark Townsley, Thomas Clausen

Research output: Contribution to journalArticlepeer-review

Abstract

In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information - and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information - without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open- and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer - and demonstrates the use of three different ML paradigms - unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.

Original languageEnglish
Pages (from-to)4028-4044
Number of pages17
JournalIEEE Transactions on Network and Service Management
Volume19
Issue number4
DOIs
Publication statusPublished - 1 Dec 2022

Keywords

  • Service management
  • cloud
  • data-driven
  • high performance network
  • performance evaluation

Fingerprint

Dive into the research topics of 'Aquarius - Enable Fast, Scalable, Data-Driven Service Management in the Cloud'. Together they form a unique fingerprint.

Cite this