TY - GEN
T1 - Leveraging Machine Learning-Based PDF Malware Detection in Snort
AU - Chbib, Fadlallah
AU - Mustafa, Ali
AU - Khatoun, Rida
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - In the current digital era, the Portable Document Format (PDF) is a commonly used file format for exchanging and storing documents, images, and other data types. The PDF format's popularity stems from its ability to preserve the original document's layout, font, and graphics, making it an ideal choice for sharing sensitive information such as financial reports, legal documents, and confidential data. However, this widespread adoption has also made PDFs an attractive target for attackers who seek to exploit vulnerabilities in these documents to spread malware. Several solutions have been proposed to identify and mitigate threats embedded within PDF files, including signature-based detection and behavioral analysis. However, these methods are often insufficient for detecting PDF-based threats. In this paper, we propose an approach that monitors incoming PDFs to identify patterns and anomalies indicative of malicious PDFs. We use an ensemble Machine Learning-based detection system based on Random Forest, Support Vector Machine (SVM), and Gradient Boosting which analyzes various PDF features, such as file size, metadata size, obj, Javascript, and metadata size at the network entry point. We evaluate the algorithm performance with a separate dataset where the result of our approach achieved an accuracy of up to 92%. We demonstrate the model's explainability by creating a visualization to interpret its decisions. Finally, we integrate the ML model obtained as a new plugin in the Snort IDS detection engine to enhance its capabilities by adding analysis techniques to its traditional rule-based detection mechanisms.
AB - In the current digital era, the Portable Document Format (PDF) is a commonly used file format for exchanging and storing documents, images, and other data types. The PDF format's popularity stems from its ability to preserve the original document's layout, font, and graphics, making it an ideal choice for sharing sensitive information such as financial reports, legal documents, and confidential data. However, this widespread adoption has also made PDFs an attractive target for attackers who seek to exploit vulnerabilities in these documents to spread malware. Several solutions have been proposed to identify and mitigate threats embedded within PDF files, including signature-based detection and behavioral analysis. However, these methods are often insufficient for detecting PDF-based threats. In this paper, we propose an approach that monitors incoming PDFs to identify patterns and anomalies indicative of malicious PDFs. We use an ensemble Machine Learning-based detection system based on Random Forest, Support Vector Machine (SVM), and Gradient Boosting which analyzes various PDF features, such as file size, metadata size, obj, Javascript, and metadata size at the network entry point. We evaluate the algorithm performance with a separate dataset where the result of our approach achieved an accuracy of up to 92%. We demonstrate the model's explainability by creating a visualization to interpret its decisions. Finally, we integrate the ML model obtained as a new plugin in the Snort IDS detection engine to enhance its capabilities by adding analysis techniques to its traditional rule-based detection mechanisms.
KW - Cybersecurity
KW - Machine Learning
KW - PDF malware attack
KW - Portable Document Format (PDF)
UR - https://www.scopus.com/pages/publications/85215972301
U2 - 10.1109/ICECCME62383.2024.10796480
DO - 10.1109/ICECCME62383.2024.10796480
M3 - Conference contribution
AN - SCOPUS:85215972301
T3 - International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024
BT - International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024
Y2 - 4 November 2024 through 6 November 2024
ER -