TY - GEN
T1 - Exploratory landscape analysis is strongly sensitive to the sampling strategy
AU - Renau, Quentin
AU - Doerr, Carola
AU - Dreo, Johann
AU - Doerr, Benjamin
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Exploratory landscape analysis (ELA) supports supervised learning approaches for automated algorithm selection and configuration by providing sets of features that quantify the most relevant characteristics of the optimization problem at hand. In black-box optimization, where an explicit problem representation is not available, the feature values need to be approximated from a small number of sample points. In practice, uniformly sampled random point sets and Latin hypercube constructions are commonly used sampling strategies. In this work, we analyze how the sampling method and the sample size influence the quality of the feature value approximations and how this quality impacts the accuracy of a standard classification task. While, not unexpectedly, increasing the number of sample points gives more robust estimates for the feature values, to our surprise we find that the feature value approximations for different sampling strategies do not converge to the same value. This implies that approximated feature values cannot be interpreted independently of the underlying sampling strategy. As our classification experiments show, this also implies that the feature approximations used for training a classifier must stem from the same sampling strategy as those used for the actual classification tasks. As a side result we show that classifiers trained with feature values approximated by Sobol’ sequences achieve higher accuracy than any of the standard sampling techniques. This may indicate improvement potential for ELA-trained machine learning models.
AB - Exploratory landscape analysis (ELA) supports supervised learning approaches for automated algorithm selection and configuration by providing sets of features that quantify the most relevant characteristics of the optimization problem at hand. In black-box optimization, where an explicit problem representation is not available, the feature values need to be approximated from a small number of sample points. In practice, uniformly sampled random point sets and Latin hypercube constructions are commonly used sampling strategies. In this work, we analyze how the sampling method and the sample size influence the quality of the feature value approximations and how this quality impacts the accuracy of a standard classification task. While, not unexpectedly, increasing the number of sample points gives more robust estimates for the feature values, to our surprise we find that the feature value approximations for different sampling strategies do not converge to the same value. This implies that approximated feature values cannot be interpreted independently of the underlying sampling strategy. As our classification experiments show, this also implies that the feature approximations used for training a classifier must stem from the same sampling strategy as those used for the actual classification tasks. As a side result we show that classifiers trained with feature values approximated by Sobol’ sequences achieve higher accuracy than any of the standard sampling techniques. This may indicate improvement potential for ELA-trained machine learning models.
KW - Automated algorithm design
KW - Black-box optimization
KW - Exploratory landscape analysis
KW - Feature extraction
UR - https://www.scopus.com/pages/publications/85091147240
U2 - 10.1007/978-3-030-58115-2_10
DO - 10.1007/978-3-030-58115-2_10
M3 - Conference contribution
AN - SCOPUS:85091147240
SN - 9783030581145
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 139
EP - 153
BT - Parallel Problem Solving from Nature – PPSN XVI - 16th International Conference, PPSN 2020, Proceedings
A2 - Bäck, Thomas
A2 - Preuss, Mike
A2 - Deutz, André
A2 - Emmerich, Michael
A2 - Wang, Hao
A2 - Doerr, Carola
A2 - Trautmann, Heike
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th International Conference on Parallel Problem Solving from Nature, PPSN 2020
Y2 - 5 September 2020 through 9 September 2020
ER -