TY - GEN
T1 - Learning and data selection in big datasets
AU - Ghadikolaei, Hossein S.
AU - Ghauch, Hadi
AU - Fischione, Carlo
AU - Skoglund, Mikael
N1 - Publisher Copyright:
Copyright © 2019 ASME
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.
AB - Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.
UR - https://www.scopus.com/pages/publications/85078292566
M3 - Conference contribution
AN - SCOPUS:85078292566
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 3848
EP - 3857
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -