Passer à la navigation principale Passer à la recherche Passer au contenu principal

What's a good imputation to predict with missing values?

  • Université Paris-Saclay
  • INRIA
  • Ecole polytechnique

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation is not needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

langue originaleAnglais
titreAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
rédacteurs en chefMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
EditeurNeural information processing systems foundation
Pages11530-11540
Nombre de pages11
ISBN (Electronique)9781713845393
étatPublié - 1 janv. 2021
Modification externeOui
Evénement35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Durée: 6 déc. 202114 déc. 2021

Série de publications

NomAdvances in Neural Information Processing Systems
Volume14
ISSN (imprimé)1049-5258

Une conférence

Une conférence35th Conference on Neural Information Processing Systems, NeurIPS 2021
La villeVirtual, Online
période6/12/2114/12/21

Empreinte digitale

Examiner les sujets de recherche de « What's a good imputation to predict with missing values? ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation