Passer à la navigation principale Passer à la recherche Passer au contenu principal

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

  • Ben Halstead
  • , Yun Sing Koh
  • , Patricia Riddle
  • , Russel Pears
  • , Mykola Pechenizkiy
  • , Albert Bifet
  • University of Auckland
  • Auckland University of Technology
  • Technical University of Eindhoven

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

A data stream is a sequence of observations produced by a generating process which may evolve over time. In such a time-varying stream the relationship between input features and labels, or concepts, can change. Adapting to changes in concept is most often done by destroying and incrementally rebuilding the current classifier. Many systems additionally store and reuse previously built models to more efficiently adapt when stream conditions drift to a previously seen state. Reusing a model offers increased classification performance over rebuilding, and provides an indicator, or transparency, into the hidden state of the generating process. When only a subset of past models can be stored for reuse, for example due to memory constraints, the choice of which models to store for optimal future reuse is an important problem. Current methods of evaluating which models to store use valuation policies such as age, time since last use, accuracy and diversity. These policies are often not optimal, losing predictive performance by undervaluing complex models. We propose a new valuation policy based on advantage, the misclassifications avoided by reusing a model rather than training a new model, which more accurately reflects the true value of model storage. We evaluate our method on synthetic and real world data, including a real world air pollution dataset. Our results show accuracy increases of up to 6% using our valuation policy, while preserving transparency.

langue originaleAnglais
Pages (de - à)796-836
Nombre de pages41
journalData Mining and Knowledge Discovery
Volume35
Numéro de publication3
Les DOIs
étatPublié - 1 mai 2021

Empreinte digitale

Examiner les sujets de recherche de « Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation