TY - GEN
T1 - How Dataset Diversity Affects Generalization in ML-Based NIDS
AU - Nougnanke, Benoit
AU - Blanc, Gregory
AU - Robert, Thomas
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026/1/1
Y1 - 2026/1/1
N2 - Machine Learning-based Network Intrusion Detection Systems (ML-based NIDS) rely heavily on the quality of the datasets used for training and evaluation. However, widely used NIDS benchmarks often suffer from poor data diversity, which limits model generalization and undermines the reliability of evaluation protocols. While prior work has acknowledged this limitation, a systematic framework to quantify dataset diversity and analyze its relationship with performance is still missing. To address this gap, we introduce a structured approach for characterizing dataset diversity in ML-based NIDS, grounded in measurement theory. We distinguish three types of diversity—intra-class, inter-class, and domain-shift—and operationalize their measurement using established metrics such as the Vendi Score and the Jensen-Shannon divergence. Our empirical analysis on the CIC-IDS2018 dataset, spanning sixty diversity-controlled train–test experiments, provides new insights into the relationship between diversity and generalization and demonstrates the value of diversity-aware data sampling for improving evaluation reliability.
AB - Machine Learning-based Network Intrusion Detection Systems (ML-based NIDS) rely heavily on the quality of the datasets used for training and evaluation. However, widely used NIDS benchmarks often suffer from poor data diversity, which limits model generalization and undermines the reliability of evaluation protocols. While prior work has acknowledged this limitation, a systematic framework to quantify dataset diversity and analyze its relationship with performance is still missing. To address this gap, we introduce a structured approach for characterizing dataset diversity in ML-based NIDS, grounded in measurement theory. We distinguish three types of diversity—intra-class, inter-class, and domain-shift—and operationalize their measurement using established metrics such as the Vendi Score and the Jensen-Shannon divergence. Our empirical analysis on the CIC-IDS2018 dataset, spanning sixty diversity-controlled train–test experiments, provides new insights into the relationship between diversity and generalization and demonstrates the value of diversity-aware data sampling for improving evaluation reliability.
KW - Diversity
KW - Generalization
KW - Machine Learning
KW - Measurement Theory
KW - NIDS Datasets
KW - Performance Evaluation
UR - https://www.scopus.com/pages/publications/105020262125
U2 - 10.1007/978-3-032-07884-1_14
DO - 10.1007/978-3-032-07884-1_14
M3 - Conference contribution
AN - SCOPUS:105020262125
SN - 9783032078834
T3 - Lecture Notes in Computer Science
SP - 269
EP - 288
BT - Computer Security – ESORICS 2025 - 30th European Symposium on Research in Computer Security, Proceedings
A2 - Nicomette, Vincent
A2 - Benzekri, Abdelmalek
A2 - Boulahia-Cuppens, Nora
A2 - Vaidya, Jaideep
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th European Symposium on Research in Computer Security, ESORICS 2025
Y2 - 22 September 2025 through 24 September 2025
ER -