TY - GEN
T1 - Generative Deep Learning for Internet of Things Network Traffic Generation
AU - Shahid, Mustafizur R.
AU - Blanc, Gregory
AU - Jmila, Houda
AU - Zhang, Zonghua
AU - Debar, Herve
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - The rapid development of the Internet of Things (IoT) has prompted a recent interest into realistic IoT network traffic generation. Security practitioners need IoT network traffic data to develop and assess network-based intrusion detection systems (NIDS). Emulating realistic network traffic will avoid the costly physical deployment of thousands of smart devices. From an attacker's perspective, generating network traffic that mimics the legitimate behavior of a device can be useful to evade NIDS. As network traffic data consist of sequences of packets, the problem is similar to the generation of sequences of categorical data, like word by word text generation. Many solutions in the field of natural language processing have been proposed to adapt a Generative Adversarial Network (GAN) to generate sequences of categorical data. In this paper, we propose to combine an autoencoder with a GAN to generate sequences of packet sizes that correspond to bidirectional flows. First, the autoencoder is trained to learn a latent representation of the real sequences of packet sizes. A GAN is then trained on the latent space, to learn to generate latent vectors that can be decoded into realistic sequences. For experimental purposes, bidirectional flows produced by a Google Home Mini are used, and the autoencoder is combined with a Wassertein GAN. Comparison of different network characteristics shows that our proposed approach is able to generate sequences of packet sizes that behave closely to real bidirectional flows. We also show that the synthetic bidirectional flows are close enough to the real ones that they can fool anomaly detectors into labeling them as legitimate.
AB - The rapid development of the Internet of Things (IoT) has prompted a recent interest into realistic IoT network traffic generation. Security practitioners need IoT network traffic data to develop and assess network-based intrusion detection systems (NIDS). Emulating realistic network traffic will avoid the costly physical deployment of thousands of smart devices. From an attacker's perspective, generating network traffic that mimics the legitimate behavior of a device can be useful to evade NIDS. As network traffic data consist of sequences of packets, the problem is similar to the generation of sequences of categorical data, like word by word text generation. Many solutions in the field of natural language processing have been proposed to adapt a Generative Adversarial Network (GAN) to generate sequences of categorical data. In this paper, we propose to combine an autoencoder with a GAN to generate sequences of packet sizes that correspond to bidirectional flows. First, the autoencoder is trained to learn a latent representation of the real sequences of packet sizes. A GAN is then trained on the latent space, to learn to generate latent vectors that can be decoded into realistic sequences. For experimental purposes, bidirectional flows produced by a Google Home Mini are used, and the autoencoder is combined with a Wassertein GAN. Comparison of different network characteristics shows that our proposed approach is able to generate sequences of packet sizes that behave closely to real bidirectional flows. We also show that the synthetic bidirectional flows are close enough to the real ones that they can fool anomaly detectors into labeling them as legitimate.
KW - Autoencoder
KW - Deep Learning
KW - Generative Adversarial Network
KW - Internet of Things
KW - Network Security
U2 - 10.1109/PRDC50213.2020.00018
DO - 10.1109/PRDC50213.2020.00018
M3 - Conference contribution
AN - SCOPUS:85099880031
T3 - Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC
SP - 70
EP - 79
BT - Proceedings - 2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing, PRDC 2020
PB - IEEE Computer Society
T2 - 25th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2020
Y2 - 1 December 2020 through 4 December 2020
ER -