Abstract
The Receiver Operating Characteristic (ROC) curve is one of the most widely used visual tools to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC curve. In this bipartite setup, the target is known to be the regression function up to an increasing transform, and solving the optimization problem boils down to recovering the collection of level sets of the latter, which we interpret here as a continuum of imbricated classification problems. We propose a discretization approach, consisting of building a finite sequence of N classifiers by constrained empirical risk minimization and then constructing a piecewise constant scoring function sN(x) by overlaying the resulting classifiers. Given the functional nature of the ROC criterion, the accuracy of the ranking induced by sN(x) can be conceived in a variety of ways, depending on the distance chosen for measuring closeness to the optimal curve in the ROC space. By relating the ROC curve of the resulting scoring function to piecewise linear approximates of the optimal ROC curve, we establish the consistency of the method as well as rate bounds to control its generalization ability in sup-norm. Eventually, we also highlight the fact that, as a byproduct, the algorithm proposed provides an accurate estimate of the optimal ROC curve.
| Original language | English |
|---|---|
| Pages (from-to) | 619-648 |
| Number of pages | 30 |
| Journal | Constructive Approximation |
| Volume | 32 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 1 Jan 2010 |
Keywords
- AUC criterion
- Bipartite ranking
- Density level set
- Minimum volume set estimation
- Piecewise linear approximation
- ROC curve
- Scoring function
- Statistical learning
- Sup-norm