A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

  • Ben Halstead
  • , Yun Sing Koh
  • , Patricia Riddle
  • , Mykola Pechenizkiy
  • , Albert Bifet

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, e.g., when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state. The current state is continuously maintained using a Hoeffding bound based algorithm, which unlike existing methods, guarantees that every observation is classified using the state estimated as the most relevant, while also maintaining temporal stability. We find SELeCT is able to choose experience relevant to ground truth concepts with recall and precision above 0.9, significantly outperforming existing methods and close to a theoretical optimum, leading to significantly higher accuracy and enabling new opportunities for learning in complex changing conditions.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics, DSAA 2022
EditorsJoshua Zhexue Huang, Yi Pan, Barbara Hammer, Muhammad Khurram Khan, Xing Xie, Laizhong Cui, Yulin He
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665473309
DOIs
Publication statusPublished - 1 Jan 2022
Event9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022 - Shenzhen, China
Duration: 13 Oct 202216 Oct 2022

Publication series

NameProceedings - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics, DSAA 2022

Conference

Conference9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022
Country/TerritoryChina
CityShenzhen
Period13/10/2216/10/22

Keywords

  • Data Streams
  • Recurring Concepts

Fingerprint

Dive into the research topics of 'A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams'. Together they form a unique fingerprint.

Cite this