TY - GEN
T1 - Learning from time-changing data with adaptive windowing
AU - Bifet, Albert
AU - Gavaldà, Ricard
PY - 2007/1/1
Y1 - 2007/1/1
N2 - We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time-and memory-efficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naïve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain up-to-date estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixed-size and variable-size window strategies with good results.
AB - We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time-and memory-efficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naïve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain up-to-date estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixed-size and variable-size window strategies with good results.
KW - Concept and distribution drift
KW - Data streams
KW - Naïve bayes
KW - Time-changing data
U2 - 10.1137/1.9781611972771.42
DO - 10.1137/1.9781611972771.42
M3 - Conference contribution
AN - SCOPUS:70449100647
SN - 9780898716306
T3 - Proceedings of the 7th SIAM International Conference on Data Mining
SP - 443
EP - 448
BT - Proceedings of the 7th SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics Publications
T2 - 7th SIAM International Conference on Data Mining
Y2 - 26 April 2007 through 28 April 2007
ER -