Résumé
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.
| langue originale | Anglais |
|---|---|
| Pages (de - à) | 2191-2200 |
| Nombre de pages | 10 |
| journal | Proceedings of Machine Learning Research |
| Volume | 97 |
| état | Publié - 1 janv. 2019 |
| Evénement | 36th International Conference on Machine Learning, ICML 2019 - Long Beach, États-Unis Durée: 9 juin 2019 → 15 juin 2019 |
Empreinte digitale
Examiner les sujets de recherche de « Learning and Data Selection in Big Datasets ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver