Skip to main navigation Skip to search Skip to main content

Incremental k-Nearest Neighbors Using Reservoir Sampling for Data Streams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The online and potentially infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of – and/or synopsis information from – the stream. To process these evolving data, we need efficient and accurate methodologies and systems, such as window models (e.g., sliding windows) and summarization techniques (e.g., sampling, sketching, dimensionality reduction). In this paper, we propose, RW-kNN, a k-Nearest Neighbors (kNN) algorithm that employs a practical way to store information about past instances using the biased reservoir sampling to sample the input instances along with a sliding window to maintain the most recent instances from the stream. We evaluate our proposal on a diverse set of synthetic and real datasets and compare against state-of-the-art algorithms in a traditional test-then-train evaluation. Results show how our proposed RW-kNN approach produces high-predictive performance for both real and synthetic datasets while using a feasible amount of resources.

Original languageEnglish
Title of host publicationDiscovery Science - 24th International Conference, DS 2021, Proceedings
EditorsCarlos Soares, Luis Torgo
PublisherSpringer Science and Business Media Deutschland GmbH
Pages122-137
Number of pages16
ISBN (Print)9783030889418
DOIs
Publication statusPublished - 1 Jan 2021
Event24th International Conference on Discovery Science, DS 2021 - Virtual, Online
Duration: 11 Oct 202113 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12986 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Discovery Science, DS 2021
CityVirtual, Online
Period11/10/2113/10/21

Keywords

  • Data stream classification
  • K-nearest neighbors
  • Reservoir sampling
  • Sliding window

Fingerprint

Dive into the research topics of 'Incremental k-Nearest Neighbors Using Reservoir Sampling for Data Streams'. Together they form a unique fingerprint.

Cite this