Scalable model-based cascaded imputation of missing data

Jacob Montiel, Jesse Read, Albert Bifet, Talel Abdessalem

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Missing data is a common trait of real-world data that can negatively impact interpretability. In this paper, we present Cascade Imputation (CIM), an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the data set, providing support for: Missing At Random and Missing Completely At Random data, numerical and nominal attributes, and large data sets including highly dimensional data sets. We compare CIM against well-established imputation techniques over a variety of data sets under multiple test configurations to measure the impact of imputation on the classification problem. Test results show that CIM outperforms other imputation methods over multiple test conditions. Additionally, we identify optimal performance and failure conditions for popular imputation techniques.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, PAKDD 2018, Proceedings
EditorsGeoffrey I. Webb, Dinh Phung, Mohadeseh Ganji, Lida Rashidi, Vincent S. Tseng, Bao Ho
PublisherSpringer Verlag
Pages64-76
Number of pages13
ISBN (Print)9783319930398
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018 - Melbourne, Australia
Duration: 3 Jun 20186 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10939 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018
Country/TerritoryAustralia
CityMelbourne
Period3/06/186/06/18

Keywords

  • Classification
  • Imputation
  • Missing data

Fingerprint

Dive into the research topics of 'Scalable model-based cascaded imputation of missing data'. Together they form a unique fingerprint.

Cite this