Incremental ensemble classifier addressing non-stationary fast data streams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Classification of data points in a data stream is a fundamentally different set of challenges than data mining on static data. While streaming data is often placed into the context of 'Big Data' (or more specifically 'Fast Data') wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the stream, the available labels (or concepts) often change, and a concept's definition in the feature space can also evolve (or drift) over time. The core issue is that the hidden generative function of the data is not a constant function, but rather evolves over time. This is known as a non-stationary distribution. In this paper, we describe a new approach to using ensembles for stream classification. While the core method is straightforward, it is specifically designed to adapt quickly with very little overhead to the dynamic and evolving nature of data streams generated from non-stationary functions. Our method, M3, is based on a weighted majority ensemble of heterogeneous model types where model weights are updated on-line using Reinforcement Learning techniques. We compare our method with current leading algorithms as implemented in the Massive Online Analysis (MOA) framework using UCI benchmark and synthetic stream generator data sets, and find that our method shows particularly strong gain over the baseline method when ground truth is of limited availability to the classifiers.

Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
EditorsZhi-Hua Zhou, Wei Wang, Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
PublisherIEEE Computer Society
Pages716-723
Number of pages8
EditionJanuary
ISBN (Electronic)9781479942749
DOIs
Publication statusPublished - 26 Jan 2015
Externally publishedYes
Event14th IEEE International Conference on Data Mining Workshops, ICDMW 2014 - Shenzhen, China
Duration: 14 Dec 2014 → …

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
NumberJanuary
Volume2015-January
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259

Conference

Conference14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Country/TerritoryChina
CityShenzhen
Period14/12/14 → …

Keywords

  • Big Data
  • Fast Data
  • Stream mining
  • classifier
  • non-stationary distribution

Fingerprint

Dive into the research topics of 'Incremental ensemble classifier addressing non-stationary fast data streams'. Together they form a unique fingerprint.

Cite this