TY - GEN
T1 - Clustering based active learning for evolving data streams
AU - Ienco, Dino
AU - Bifet, Albert
AU - Žliobaite, Indre
AU - Pfahringer, Bernhard
PY - 2013/1/1
Y1 - 2013/1/1
N2 - Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.
AB - Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.
U2 - 10.1007/978-3-642-40897-7_6
DO - 10.1007/978-3-642-40897-7_6
M3 - Conference contribution
AN - SCOPUS:84888318327
SN - 9783642408960
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 79
EP - 93
BT - Discovery Science - 16th International Conference, DS 2013, Proceedings
PB - Springer Verlag
T2 - 16th International Conference on Discovery Science, DS 2013
Y2 - 6 October 2013 through 9 October 2013
ER -