TY - GEN
T1 - Fingerprinting concepts in data streams with supervised and unsupervised meta-information
AU - Halstead, Ben
AU - Koh, Yun Sing
AU - Riddle, Patricia
AU - Pechenizkiy, Mykola
AU - Bifet, Albert
AU - Pears, Russel
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.
AB - Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.
KW - Concept Drift
KW - Data Stream
KW - Meta-Information
KW - Recurring Concepts
UR - https://www.scopus.com/pages/publications/85112864007
U2 - 10.1109/ICDE51399.2021.00096
DO - 10.1109/ICDE51399.2021.00096
M3 - Conference contribution
AN - SCOPUS:85112864007
T3 - Proceedings - International Conference on Data Engineering
SP - 1056
EP - 1067
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - IEEE Computer Society
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
Y2 - 19 April 2021 through 22 April 2021
ER -