Leveraging Machine Learning-Based PDF Malware Detection in Snort

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the current digital era, the Portable Document Format (PDF) is a commonly used file format for exchanging and storing documents, images, and other data types. The PDF format's popularity stems from its ability to preserve the original document's layout, font, and graphics, making it an ideal choice for sharing sensitive information such as financial reports, legal documents, and confidential data. However, this widespread adoption has also made PDFs an attractive target for attackers who seek to exploit vulnerabilities in these documents to spread malware. Several solutions have been proposed to identify and mitigate threats embedded within PDF files, including signature-based detection and behavioral analysis. However, these methods are often insufficient for detecting PDF-based threats. In this paper, we propose an approach that monitors incoming PDFs to identify patterns and anomalies indicative of malicious PDFs. We use an ensemble Machine Learning-based detection system based on Random Forest, Support Vector Machine (SVM), and Gradient Boosting which analyzes various PDF features, such as file size, metadata size, obj, Javascript, and metadata size at the network entry point. We evaluate the algorithm performance with a separate dataset where the result of our approach achieved an accuracy of up to 92%. We demonstrate the model's explainability by creating a visualization to interpret its decisions. Finally, we integrate the ML model obtained as a new plugin in the Snort IDS detection engine to enhance its capabilities by adding analysis techniques to its traditional rule-based detection mechanisms.

Original languageEnglish
Title of host publicationInternational Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350391183
DOIs
Publication statusPublished - 1 Jan 2024
Event4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024 - Male, Maldives
Duration: 4 Nov 20246 Nov 2024

Publication series

NameInternational Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024

Conference

Conference4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2024
Country/TerritoryMaldives
CityMale
Period4/11/246/11/24

Keywords

  • Cybersecurity
  • Machine Learning
  • PDF malware attack
  • Portable Document Format (PDF)

Fingerprint

Dive into the research topics of 'Leveraging Machine Learning-Based PDF Malware Detection in Snort'. Together they form a unique fingerprint.

Cite this