TY - GEN
T1 - Estimating g-Leakage via Machine Learning
AU - Romanelli, Marco
AU - Chatzikokolakis, Konstantinos
AU - Palamidessi, Catuscia
AU - Piantanida, Pablo
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/30
Y1 - 2020/10/30
N2 - This paper considers the problem of estimating the information leakage of a system in the black-box scenario, i.e. when the system's internals are unknown to the learner, or too complicated to analyze, and the only available information are pairs of input-output data samples, obtained by submitting queries to the system or provided by a third party. The frequentist approach relies on counting the frequencies to estimate the input-output conditional probabilities, however this method is not accurate when the domain of possible outputs is large. To overcome this difficulty, the estimation of the Bayes error of the ideal classifier was recently investigated using Machine Learning (ML) models, and it has been shown to be more accurate thanks to the ability of those models to learn the input-output correspondence. However, the Bayes vulnerability is only suitable to describe one-try attacks. A more general and flexible measure of leakage is the g-vulnerability, which encompasses several different types of adversaries, with different goals and capabilities. We propose a novel approach to perform black-box estimation of the g-vulnerability using ML which does not require to estimate the conditional probabilities and is suitable for a large class of ML algorithms. First, we formally show the learnability for all data distributions. Then, we evaluate the performance via various experiments using k-Nearest Neighbors and Neural Networks. Our approach outperform the frequentist one when the observables domain is large.
AB - This paper considers the problem of estimating the information leakage of a system in the black-box scenario, i.e. when the system's internals are unknown to the learner, or too complicated to analyze, and the only available information are pairs of input-output data samples, obtained by submitting queries to the system or provided by a third party. The frequentist approach relies on counting the frequencies to estimate the input-output conditional probabilities, however this method is not accurate when the domain of possible outputs is large. To overcome this difficulty, the estimation of the Bayes error of the ideal classifier was recently investigated using Machine Learning (ML) models, and it has been shown to be more accurate thanks to the ability of those models to learn the input-output correspondence. However, the Bayes vulnerability is only suitable to describe one-try attacks. A more general and flexible measure of leakage is the g-vulnerability, which encompasses several different types of adversaries, with different goals and capabilities. We propose a novel approach to perform black-box estimation of the g-vulnerability using ML which does not require to estimate the conditional probabilities and is suitable for a large class of ML algorithms. First, we formally show the learnability for all data distributions. Then, we evaluate the performance via various experiments using k-Nearest Neighbors and Neural Networks. Our approach outperform the frequentist one when the observables domain is large.
KW - g-vulnerability estimation
KW - machine learning
KW - neural networks
U2 - 10.1145/3372297.3423363
DO - 10.1145/3372297.3423363
M3 - Conference contribution
AN - SCOPUS:85096188761
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 697
EP - 716
BT - CCS 2020 - Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery
T2 - 27th ACM SIGSAC Conference on Computer and Communications Security, CCS 2020
Y2 - 9 November 2020 through 13 November 2020
ER -