Passer à la navigation principale Passer à la recherche Passer au contenu principal

Naive imputation implicitly regularizes high-dimensional linear models

  • Sorbonne Université
  • Ecole polytechnique

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) used, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the d ≫ √n regime. Experiments illustrate our findings.

langue originaleAnglais
Pages (de - à)1320-1340
Nombre de pages21
journalProceedings of Machine Learning Research
Volume202
étatPublié - 1 janv. 2023
Evénement40th International Conference on Machine Learning, ICML 2023 - Honolulu, États-Unis
Durée: 23 juil. 202329 juil. 2023

Empreinte digitale

Examiner les sujets de recherche de « Naive imputation implicitly regularizes high-dimensional linear models ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation