Skip to main navigation Skip to search Skip to main content

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

  • Ben Halstead
  • , Yun Sing Koh
  • , Patricia Riddle
  • , Russel Pears
  • , Mykola Pechenizkiy
  • , Albert Bifet
  • University of Auckland
  • Auckland University of Technology
  • Technical University of Eindhoven

Research output: Contribution to journalArticlepeer-review

Abstract

A data stream is a sequence of observations produced by a generating process which may evolve over time. In such a time-varying stream the relationship between input features and labels, or concepts, can change. Adapting to changes in concept is most often done by destroying and incrementally rebuilding the current classifier. Many systems additionally store and reuse previously built models to more efficiently adapt when stream conditions drift to a previously seen state. Reusing a model offers increased classification performance over rebuilding, and provides an indicator, or transparency, into the hidden state of the generating process. When only a subset of past models can be stored for reuse, for example due to memory constraints, the choice of which models to store for optimal future reuse is an important problem. Current methods of evaluating which models to store use valuation policies such as age, time since last use, accuracy and diversity. These policies are often not optimal, losing predictive performance by undervaluing complex models. We propose a new valuation policy based on advantage, the misclassifications avoided by reusing a model rather than training a new model, which more accurately reflects the true value of model storage. We evaluate our method on synthetic and real world data, including a real world air pollution dataset. Our results show accuracy increases of up to 6% using our valuation policy, while preserving transparency.

Original languageEnglish
Pages (from-to)796-836
Number of pages41
JournalData Mining and Knowledge Discovery
Volume35
Issue number3
DOIs
Publication statusPublished - 1 May 2021

Keywords

  • Data streams
  • Memory management
  • Model valuation policy

Fingerprint

Dive into the research topics of 'Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency'. Together they form a unique fingerprint.

Cite this