Skip to main navigation Skip to search Skip to main content

A factorized version space algorithm for "human-in-the-loop" data exploration

  • UMass Amherst

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

While active learning (AL) has been recently applied to help the user explore a large database to retrieve data instances of interest, existing methods often require a large number of instances to be labeled in order to achieve good accuracy. To address this slow convergence problem, our work augments version space-based AL algorithms, which have strong theoretical results on convergence but are very costly to run, with additional insights obtained in the user labeling process. These insights lead to a novel algorithm that factorizes the version space to perform active learning in a set of subspaces. Our work offers theoretical results on optimality and approximation for this algorithm, as well as optimizations for better performance. Evaluation results show that our factorized version space algorithm significantly outperforms other version space algorithms, as well as a recent factorization-aware algorithm, for large database exploration.

Original languageEnglish
Title of host publicationProceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
EditorsJianyong Wang, Kyuseok Shim, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1018-1023
Number of pages6
ISBN (Electronic)9781728146034
DOIs
Publication statusPublished - 1 Nov 2019
Event19th IEEE International Conference on Data Mining, ICDM 2019 - Beijing, China
Duration: 8 Nov 201911 Nov 2019

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2019-November
ISSN (Print)1550-4786

Conference

Conference19th IEEE International Conference on Data Mining, ICDM 2019
Country/TerritoryChina
CityBeijing
Period8/11/1911/11/19

Keywords

  • Active learning
  • Data exploration
  • Version space

Fingerprint

Dive into the research topics of 'A factorized version space algorithm for "human-in-the-loop" data exploration'. Together they form a unique fingerprint.

Cite this