Passer à la navigation principale Passer à la recherche Passer au contenu principal

Spark-based cloud data analytics using multi-objective optimization

  • Fei Song
  • , Khaled Zaouk
  • , Chenghao Lyu
  • , Arnab Sinha
  • , Qi Fan
  • , Yanlei Diao
  • , Prashant Shenoy
  • Ecole Polytechnique
  • UMass Amherst

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take task objectives such as user performance goals and budgetary constraints and automatically configure an analytic job to achieve these objectives. This paper presents UDAO, a Spark-based Unified Data Analytics Optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of configurations to reveal tradeoffs between different objectives, recommends a new Spark configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. Compared to Ottertune, a state-of-the-art performance tuning system, UDAO recommends Spark configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different user preferences on multiple objectives.

langue originaleAnglais
titreProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
EditeurIEEE Computer Society
Pages396-407
Nombre de pages12
ISBN (Electronique)9781728191843
Les DOIs
étatPublié - 1 avr. 2021
Modification externeOui
Evénement37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Online, Chania, Grcce
Durée: 19 avr. 202122 avr. 2021

Série de publications

NomProceedings - International Conference on Data Engineering
Volume2021-April
ISSN (imprimé)1084-4627
ISSN (Electronique)2375-0286

Une conférence

Une conférence37th IEEE International Conference on Data Engineering, ICDE 2021
Pays/TerritoireGrcce
La villeVirtual, Online, Chania
période19/04/2122/04/21

Empreinte digitale

Examiner les sujets de recherche de « Spark-based cloud data analytics using multi-objective optimization ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation