TY - GEN
T1 - A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams
AU - Halstead, Ben
AU - Koh, Yun Sing
AU - Riddle, Patricia
AU - Pechenizkiy, Mykola
AU - Bifet, Albert
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, e.g., when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state. The current state is continuously maintained using a Hoeffding bound based algorithm, which unlike existing methods, guarantees that every observation is classified using the state estimated as the most relevant, while also maintaining temporal stability. We find SELeCT is able to choose experience relevant to ground truth concepts with recall and precision above 0.9, significantly outperforming existing methods and close to a theoretical optimum, leading to significantly higher accuracy and enabling new opportunities for learning in complex changing conditions.
AB - The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, e.g., when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state. The current state is continuously maintained using a Hoeffding bound based algorithm, which unlike existing methods, guarantees that every observation is classified using the state estimated as the most relevant, while also maintaining temporal stability. We find SELeCT is able to choose experience relevant to ground truth concepts with recall and precision above 0.9, significantly outperforming existing methods and close to a theoretical optimum, leading to significantly higher accuracy and enabling new opportunities for learning in complex changing conditions.
KW - Data Streams
KW - Recurring Concepts
U2 - 10.1109/DSAA54385.2022.10032368
DO - 10.1109/DSAA54385.2022.10032368
M3 - Conference contribution
AN - SCOPUS:85148538349
T3 - Proceedings - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics, DSAA 2022
BT - Proceedings - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics, DSAA 2022
A2 - Huang, Joshua Zhexue
A2 - Pan, Yi
A2 - Hammer, Barbara
A2 - Khan, Muhammad Khurram
A2 - Xie, Xing
A2 - Cui, Laizhong
A2 - He, Yulin
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022
Y2 - 13 October 2022 through 16 October 2022
ER -