Fingerprinting concepts in data streams with supervised and unsupervised meta-information

  • Ben Halstead
  • , Yun Sing Koh
  • , Patricia Riddle
  • , Mykola Pechenizkiy
  • , Albert Bifet
  • , Russel Pears

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PublisherIEEE Computer Society
Pages1056-1067
Number of pages12
ISBN (Electronic)9781728191843
DOIs
Publication statusPublished - 1 Apr 2021
Externally publishedYes
Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Online, Chania, Greece
Duration: 19 Apr 202122 Apr 2021

Publication series

NameProceedings - International Conference on Data Engineering
Volume2021-April
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference37th IEEE International Conference on Data Engineering, ICDE 2021
Country/TerritoryGreece
CityVirtual, Online, Chania
Period19/04/2122/04/21

Keywords

  • Concept Drift
  • Data Stream
  • Meta-Information
  • Recurring Concepts

Fingerprint

Dive into the research topics of 'Fingerprinting concepts in data streams with supervised and unsupervised meta-information'. Together they form a unique fingerprint.

Cite this