Skip to main navigation Skip to search Skip to main content

Improving parallel performance of ensemble learners for streaming data through data locality with mini-batching

  • Guilherme Cassales
  • , Heitor Gomes
  • , Albert Bifet
  • , Bernhard Pfahringer
  • , Hermes Senger
  • Universidade Federal de São Carlos
  • University of Waikato

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Machine Learning techniques have been employed in virtually all domains in the past few years. New applications demand the ability to cope with dynamic environments like data streams with transient behavior. Such environments present new requirements like incrementally process incoming data instances in a single pass, under both memory and time constraints. Furthermore, prediction models often need to adapt to concept drifts observed in non-stationary data streams. Ensemble learning comprises a class of stream mining algorithms that achieved remarkable prediction performance in this scenario. Implemented as a set of (several) individual component classifiers whose predictions are combined to predict new incoming instances, ensembles are naturally amendable for task parallelism. Despite its relevance, an efficient implementation of ensemble algorithms is still challenging. For example, dynamic data structures used to model non-stationary data behavior and detect concept drifts cause inefficient memory usage patterns and poor cache memory performance in multi-core environments. In this paper, we propose a minibatching strategy which can significantly reduce cache misses and improve the performance of several ensemble algorithms for stream mining in multi-core environments. We assess our strategy on four different state-of-Art ensemble algorithms applying four widely used machine learning benchmark datasets with varied characteristics. Results from two different hardware show speedups of up to 5X on 8-core processors with ensembles of 100 and 150 learners. The benefits come at the cost of changes in predictive performances.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages138-146
Number of pages9
ISBN (Electronic)9781728176499
DOIs
Publication statusPublished - 1 Dec 2020
Externally publishedYes
Event22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 - Virtual, Fiji, Fiji
Duration: 14 Dec 202016 Dec 2020

Publication series

NameProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020

Conference

Conference22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
Country/TerritoryFiji
CityVirtual, Fiji
Period14/12/2016/12/20

Keywords

  • Multicore task-parallelism
  • bagging algorithms
  • data-stream learning
  • ensemble learners

Fingerprint

Dive into the research topics of 'Improving parallel performance of ensemble learners for streaming data through data locality with mini-batching'. Together they form a unique fingerprint.

Cite this