Passer à la navigation principale Passer à la recherche Passer au contenu principal

Improving parallel performance of ensemble learners for streaming data through data locality with mini-batching

  • Guilherme Cassales
  • , Heitor Gomes
  • , Albert Bifet
  • , Bernhard Pfahringer
  • , Hermes Senger
  • Universidade Federal de São Carlos
  • University of Waikato

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Machine Learning techniques have been employed in virtually all domains in the past few years. New applications demand the ability to cope with dynamic environments like data streams with transient behavior. Such environments present new requirements like incrementally process incoming data instances in a single pass, under both memory and time constraints. Furthermore, prediction models often need to adapt to concept drifts observed in non-stationary data streams. Ensemble learning comprises a class of stream mining algorithms that achieved remarkable prediction performance in this scenario. Implemented as a set of (several) individual component classifiers whose predictions are combined to predict new incoming instances, ensembles are naturally amendable for task parallelism. Despite its relevance, an efficient implementation of ensemble algorithms is still challenging. For example, dynamic data structures used to model non-stationary data behavior and detect concept drifts cause inefficient memory usage patterns and poor cache memory performance in multi-core environments. In this paper, we propose a minibatching strategy which can significantly reduce cache misses and improve the performance of several ensemble algorithms for stream mining in multi-core environments. We assess our strategy on four different state-of-Art ensemble algorithms applying four widely used machine learning benchmark datasets with varied characteristics. Results from two different hardware show speedups of up to 5X on 8-core processors with ensembles of 100 and 150 learners. The benefits come at the cost of changes in predictive performances.

langue originaleAnglais
titreProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages138-146
Nombre de pages9
ISBN (Electronique)9781728176499
Les DOIs
étatPublié - 1 déc. 2020
Modification externeOui
Evénement22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020 - Virtual, Fiji, Fiji
Durée: 14 déc. 202016 déc. 2020

Série de publications

NomProceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020

Une conférence

Une conférence22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
Pays/TerritoireFiji
La villeVirtual, Fiji
période14/12/2016/12/20

Empreinte digitale

Examiner les sujets de recherche de « Improving parallel performance of ensemble learners for streaming data through data locality with mini-batching ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation