TY - GEN
T1 - Incremental k-Nearest Neighbors Using Reservoir Sampling for Data Streams
AU - Bahri, Maroua
AU - Bifet, Albert
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - The online and potentially infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of – and/or synopsis information from – the stream. To process these evolving data, we need efficient and accurate methodologies and systems, such as window models (e.g., sliding windows) and summarization techniques (e.g., sampling, sketching, dimensionality reduction). In this paper, we propose, RW-kNN, a k-Nearest Neighbors (kNN) algorithm that employs a practical way to store information about past instances using the biased reservoir sampling to sample the input instances along with a sliding window to maintain the most recent instances from the stream. We evaluate our proposal on a diverse set of synthetic and real datasets and compare against state-of-the-art algorithms in a traditional test-then-train evaluation. Results show how our proposed RW-kNN approach produces high-predictive performance for both real and synthetic datasets while using a feasible amount of resources.
AB - The online and potentially infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of – and/or synopsis information from – the stream. To process these evolving data, we need efficient and accurate methodologies and systems, such as window models (e.g., sliding windows) and summarization techniques (e.g., sampling, sketching, dimensionality reduction). In this paper, we propose, RW-kNN, a k-Nearest Neighbors (kNN) algorithm that employs a practical way to store information about past instances using the biased reservoir sampling to sample the input instances along with a sliding window to maintain the most recent instances from the stream. We evaluate our proposal on a diverse set of synthetic and real datasets and compare against state-of-the-art algorithms in a traditional test-then-train evaluation. Results show how our proposed RW-kNN approach produces high-predictive performance for both real and synthetic datasets while using a feasible amount of resources.
KW - Data stream classification
KW - K-nearest neighbors
KW - Reservoir sampling
KW - Sliding window
UR - https://www.scopus.com/pages/publications/85118106948
U2 - 10.1007/978-3-030-88942-5_10
DO - 10.1007/978-3-030-88942-5_10
M3 - Conference contribution
AN - SCOPUS:85118106948
SN - 9783030889418
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 122
EP - 137
BT - Discovery Science - 24th International Conference, DS 2021, Proceedings
A2 - Soares, Carlos
A2 - Torgo, Luis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Discovery Science, DS 2021
Y2 - 11 October 2021 through 13 October 2021
ER -