Distributed Adaptive Model Rules for mining big data streams

  • Anh Thu Vu
  • , Gianmarco De Francisci Morales
  • , Joao Gama
  • , Albert Bifet

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in samoa (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, Big Data 2014
EditorsJimmy Lin, Jian Pei, Xiaohua Tony Hu, Wo Chang, Raghunath Nambiar, Charu Aggarwal, Nick Cercone, Vasant Honavar, Jun Huan, Bamshad Mobasher, Saumyadipta Pyne
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages345-353
Number of pages9
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes
Event2nd IEEE International Conference on Big Data, Big Data 2014 - Washington, United States
Duration: 27 Oct 201430 Oct 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Conference

Conference2nd IEEE International Conference on Big Data, Big Data 2014
Country/TerritoryUnited States
CityWashington
Period27/10/1430/10/14

Fingerprint

Dive into the research topics of 'Distributed Adaptive Model Rules for mining big data streams'. Together they form a unique fingerprint.

Cite this