Passer à la navigation principale Passer à la recherche Passer au contenu principal

Towards scalable one-pass analytics using MapReduce

  • University of Massachusetts

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

An integral part of many data-intensive applications is the need to collect and analyze enormous datasets efficiently. Concurrent with such application needs is the increasing adoption of MapReduce as a programming model for processing large datasets using a cluster of machines. Current MapReduce systems, however, require the data set to be loaded into the cluster before running analytical queries, and thereby incur high delays to start query processing. Furthermore, existing systems are geared towards batch processing. In this paper, we seek to answer a fundamental question: what architectural changes are necessary to bring the benefits of the MapReduce computation model to incremental, one-pass analytics, i.e., to support stream processing and online aggregation? To answer this question, we first conduct a detailed empirical performance study of current MapReduce implementations including Hadoop and MapReduce Online using a variety of workloads. By doing so, we identify several drawbacks of existing systems for one-pass analytics. Based on the insights from our study, we list key design requirements for incremental one-pass analytics and argue for architectural changes of MapReduce systems to overcome their current limitations. We conclude by sketching an initial design of our new MapReduce-based platform for incremental one-pass analytics and showing promising preliminary results.

langue originaleAnglais
titre2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
Pages1102-1111
Nombre de pages10
Les DOIs
étatPublié - 20 déc. 2011
Modification externeOui
Evénement25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 - Anchorage, AK, États-Unis
Durée: 16 mai 201120 mai 2011

Série de publications

NomIEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

Une conférence

Une conférence25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
Pays/TerritoireÉtats-Unis
La villeAnchorage, AK
période16/05/1120/05/11

Empreinte digitale

Examiner les sujets de recherche de « Towards scalable one-pass analytics using MapReduce ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation