TY - GEN
T1 - On Single Index Models beyond Gaussian Data
AU - Bruna, Joan
AU - Pillaud-Vivien, Loucas
AU - Zweig, Aaron
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models f(x) = ϕ(x · θ∗), where the labels are generated by an arbitrary non-linear scalar link function ϕ applied to an unknown one-dimensional projection θ∗ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of Ben Arous et al. [2021], we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where ϕ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction θ∗ in the high-dimensional regime, under assumptions that extend previous works Yehudai and Shamir [2020], Wu [2022].
AB - Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models f(x) = ϕ(x · θ∗), where the labels are generated by an arbitrary non-linear scalar link function ϕ applied to an unknown one-dimensional projection θ∗ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of Ben Arous et al. [2021], we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where ϕ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction θ∗ in the high-dimensional regime, under assumptions that extend previous works Yehudai and Shamir [2020], Wu [2022].
UR - https://www.scopus.com/pages/publications/85191184656
M3 - Conference contribution
AN - SCOPUS:85191184656
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
A2 - Oh, A.
A2 - Neumann, T.
A2 - Globerson, A.
A2 - Saenko, K.
A2 - Hardt, M.
A2 - Levine, S.
PB - Neural information processing systems foundation
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -