On the prediction loss of the lasso in the partially labeled setting

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.

Original languageEnglish
Pages (from-to)3443-3472
Number of pages30
JournalElectronic Journal of Statistics
Volume12
Issue number2
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes

Keywords

  • High-dimensional regression
  • Lasso
  • Oracle inequality
  • Semi-supervised learning
  • Sparsity
  • Transductive learning

Fingerprint

Dive into the research topics of 'On the prediction loss of the lasso in the partially labeled setting'. Together they form a unique fingerprint.

Cite this