TY - JOUR
T1 - Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression
AU - Clarté, Lucas
AU - Vandenbroucque, Adrien
AU - Dalle, Guillaume
AU - Loureiro, Bruno
AU - Krzakala, Florent
AU - Zdeborová, Lenka
N1 - Publisher Copyright:
© 2024 Proceedings of Machine Learning Research. All rights reserved.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples and dimension of the covariates grow at a comparable fixed rate. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when the sampling ratio is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
AB - We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples and dimension of the covariates grow at a comparable fixed rate. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when the sampling ratio is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
UR - https://www.scopus.com/pages/publications/85212221576
M3 - Conference article
AN - SCOPUS:85212221576
SN - 2640-3498
VL - 244
SP - 787
EP - 819
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 40th Conference on Uncertainty in Artificial Intelligence, UAI 2024
Y2 - 15 July 2024 through 19 July 2024
ER -