Scalable and efficient multi-label classification for evolving data streams

Research output: Contribution to journalArticlepeer-review

Abstract

Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as classifiers must be able to deal with huge numbers of examples and to adapt to change using limited time and memory while being ready to predict at any point. This paper proposes a new experimental framework for learning and evaluating on multilabel data streams, and uses it to study the performance of various methods. From this study, we develop a multi-label Hoeffding tree with multi-label classifiers at the leaves. We show empirically that this method is well suited to this challenging task. Using our new framework, which allows us to generate realistic multi-label data streams with concept drift (as well as real data), we compare with a selection of baseline methods, as well as new learning methods from the literature, and show that our Hoeffding tree method achieves fast and more accurate performance.

Original languageEnglish
Pages (from-to)243-272
Number of pages30
JournalMachine Learning
Volume88
Issue number1-2
DOIs
Publication statusPublished - 1 Jul 2012
Externally publishedYes

Keywords

  • Data streams classification
  • Multi-label classification

Fingerprint

Dive into the research topics of 'Scalable and efficient multi-label classification for evolving data streams'. Together they form a unique fingerprint.

Cite this