Low-latency multi-threaded ensemble learning for dynamic big data streams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Real-time mining of evolving data streams involves new challenges when targeting today's application domains such as the Internet of the Things: increasing volume, velocity and volatility requires data to be processed on-the-fly with fast reaction and adaptation to changes. This paper presents a high performance scalable design for decision trees and ensemble combinations that makes use of the vector SIMD and multicore capabilities available in modern processors to provide the required throughput and accuracy. The proposed design offers very low latency and good scalability with the number of cores on commodity hardware when compared to other state-of-the art implementations. On an Intel i7-based system, processing a single decision tree is 6x faster than MOA (Java), and 7x faster than StreamDM (C++), two well-known reference implementations. On the same system, the use of the 6 cores (and 12 hardware threads) available allow to process an ensemble of 100 learners 85x faster that MOA while providing the same accuracy. Furthermore, our solution is highly scalable: on an Intel Xeon socket with large core counts, the proposed ensemble design achieves up to 16x speedup when employing 24 cores with respect to a single threaded execution.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages223-232
Number of pages10
ISBN (Electronic)9781538627143
DOIs
Publication statusPublished - 1 Jul 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • Data Streams
  • High performance
  • Hoeffding Tree
  • Low-latency
  • Random Forests

Fingerprint

Dive into the research topics of 'Low-latency multi-threaded ensemble learning for dynamic big data streams'. Together they form a unique fingerprint.

Cite this