Abstract
We introduce a new model addressing feature selection from a large dictionary of variables that can be computed from a signal or an image. Features are extracted according to an efficiency criterion, on the basis of specified classification or recognition tasks. This is done by estimating a probability distribution ℙ on the complete dictionary, which distributes its mass over the more efficient, or informative, components. We implement a stochastic gradient descent algorithm, using the probability as a state variable and optimizing a multi-task goodness of fit criterion for classifiers based on variable randomly chosen according to ℙ. We then generate classifiers from the optimal distribution of weights learned on the training set. The method is first tested on several pattern recognition problems including face detection, handwritten digit recognition, spam classification and micro-array analysis. We then compare our approach with other step-wise algorithms like random forests or recursive feature elimination.
| Original language | English |
|---|---|
| Pages (from-to) | 509-547 |
| Number of pages | 39 |
| Journal | Journal of Machine Learning Research |
| Volume | 8 |
| Publication status | Published - 1 Mar 2007 |
| Externally published | Yes |
Keywords
- Classification algorithm
- Feature selection
- Pattern recognition
- Robbins-Monro application
- Stochastic learning algorithms
Fingerprint
Dive into the research topics of 'A stochastic algorithm for feature selection in pattern recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver