TY - GEN
T1 - Streaming random patches for evolving data stream classification
AU - Gomes, Heitor Murilo
AU - Read, Jesse
AU - Bifet, Albert
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Ensemble methods are a popular choice for learning from evolving data streams. This popularity is due to (i) the ability to simulate simple, yet, successful ensemble learning strategies, such as bagging and random forests; (ii) the possibility of incorporating drift detection and recovery in conjunction to the ensemble algorithm; (iii) the availability of efficient incremental base learners, such as Hoeffding Trees. In this work, we introduce the Streaming Random Patches (SRP) algorithm, an ensemble method specially adapted to stream classification which combines random subspaces and online bagging. We provide theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produce a high predictive performance for both real and synthetic datasets. Besides, we analyze the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces).
AB - Ensemble methods are a popular choice for learning from evolving data streams. This popularity is due to (i) the ability to simulate simple, yet, successful ensemble learning strategies, such as bagging and random forests; (ii) the possibility of incorporating drift detection and recovery in conjunction to the ensemble algorithm; (iii) the availability of efficient incremental base learners, such as Hoeffding Trees. In this work, we introduce the Streaming Random Patches (SRP) algorithm, an ensemble method specially adapted to stream classification which combines random subspaces and online bagging. We provide theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produce a high predictive performance for both real and synthetic datasets. Besides, we analyze the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces).
KW - Ensemble Learning
KW - Random Patches
KW - Random Subspaces
KW - Stream Data Mining
UR - https://www.scopus.com/pages/publications/85078882878
U2 - 10.1109/ICDM.2019.00034
DO - 10.1109/ICDM.2019.00034
M3 - Conference contribution
AN - SCOPUS:85078882878
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 240
EP - 249
BT - Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
A2 - Wang, Jianyong
A2 - Shim, Kyuseok
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on Data Mining, ICDM 2019
Y2 - 8 November 2019 through 11 November 2019
ER -