TY - JOUR
T1 - Practical and ready-to-use methodology to assess the re-identification risk in anonymized datasets
AU - Sondeck, Louis Philippe
AU - Laurent, Maryline
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12/1
Y1 - 2025/12/1
N2 - To prove that a dataset is sufficiently anonymized, many privacy policies suggest that a re-identification risk assessment be performed, but do not provide a precise methodology for doing so, leaving the industry alone with the problem. This paper proposes a practical and ready-to-use methodology for re-identification risk assessment, the originality of which is manifold: (1) it is the first to follow well-known risk analysis methods (e.g. EBIOS) that have been used in the cybersecurity field for years, which consider not only the ability to perform an attack, but also the severity such an attack can have on an individual; (2) it is the first to qualify attributes and values of attributes with e.g. degree of exposure, as known real-world attacks mainly target certain types of attributes and not others; (3) it is the first to provide clear, comprehensible criteria and interpretable, explainable assessment results. In addition, the fine granularity of the methodology makes it possible to score the risk as accurately as possible, and thus maintain good data quality at an acceptable risk, which is very promising for the AI industrial sector. Finally, the implementation of the methodology is illustrated using the publicly available Adult dataset, which was assessed as having a critical risk of re-identification, with 14 concrete cases of individualization.
AB - To prove that a dataset is sufficiently anonymized, many privacy policies suggest that a re-identification risk assessment be performed, but do not provide a precise methodology for doing so, leaving the industry alone with the problem. This paper proposes a practical and ready-to-use methodology for re-identification risk assessment, the originality of which is manifold: (1) it is the first to follow well-known risk analysis methods (e.g. EBIOS) that have been used in the cybersecurity field for years, which consider not only the ability to perform an attack, but also the severity such an attack can have on an individual; (2) it is the first to qualify attributes and values of attributes with e.g. degree of exposure, as known real-world attacks mainly target certain types of attributes and not others; (3) it is the first to provide clear, comprehensible criteria and interpretable, explainable assessment results. In addition, the fine granularity of the methodology makes it possible to score the risk as accurately as possible, and thus maintain good data quality at an acceptable risk, which is very promising for the AI industrial sector. Finally, the implementation of the methodology is illustrated using the publicly available Adult dataset, which was assessed as having a critical risk of re-identification, with 14 concrete cases of individualization.
KW - Anonymized dataset
KW - Privacy
KW - Privacy impact assessment
KW - Re-identification risk assessment
UR - https://www.scopus.com/pages/publications/105010090594
U2 - 10.1038/s41598-025-04907-3
DO - 10.1038/s41598-025-04907-3
M3 - Article
C2 - 40603887
AN - SCOPUS:105010090594
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 23223
ER -