Passer à la navigation principale Passer à la recherche Passer au contenu principal

How Dataset Diversity Affects Generalization in ML-Based NIDS

  • Telecom Sudparis
  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Machine Learning-based Network Intrusion Detection Systems (ML-based NIDS) rely heavily on the quality of the datasets used for training and evaluation. However, widely used NIDS benchmarks often suffer from poor data diversity, which limits model generalization and undermines the reliability of evaluation protocols. While prior work has acknowledged this limitation, a systematic framework to quantify dataset diversity and analyze its relationship with performance is still missing. To address this gap, we introduce a structured approach for characterizing dataset diversity in ML-based NIDS, grounded in measurement theory. We distinguish three types of diversity—intra-class, inter-class, and domain-shift—and operationalize their measurement using established metrics such as the Vendi Score and the Jensen-Shannon divergence. Our empirical analysis on the CIC-IDS2018 dataset, spanning sixty diversity-controlled train–test experiments, provides new insights into the relationship between diversity and generalization and demonstrates the value of diversity-aware data sampling for improving evaluation reliability.

langue originaleAnglais
titreComputer Security – ESORICS 2025 - 30th European Symposium on Research in Computer Security, Proceedings
rédacteurs en chefVincent Nicomette, Abdelmalek Benzekri, Nora Boulahia-Cuppens, Jaideep Vaidya
EditeurSpringer Science and Business Media Deutschland GmbH
Pages269-288
Nombre de pages20
ISBN (imprimé)9783032078834
Les DOIs
étatPublié - 1 janv. 2026
Evénement30th European Symposium on Research in Computer Security, ESORICS 2025 - Toulouse, France
Durée: 22 sept. 202524 sept. 2025

Série de publications

NomLecture Notes in Computer Science
Volume16053 LNCS
ISSN (imprimé)0302-9743
ISSN (Electronique)1611-3349

Une conférence

Une conférence30th European Symposium on Research in Computer Security, ESORICS 2025
Pays/TerritoireFrance
La villeToulouse
période22/09/2524/09/25

Empreinte digitale

Examiner les sujets de recherche de « How Dataset Diversity Affects Generalization in ML-Based NIDS ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation