Building a Commit-level Dataset of Real-world Vulnerabilities

  • Alexis Challande
  • , Robin David
  • , Guénaël Renault

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

While CVE have become a de facto standard for publishing advisories on vulnerabilities, the state of current CVE databases is lackluster. Yet, CVE advisories are insufficient to bridge the gap with the vulnerability artifacts in the impacted program. Therefore, the community is lacking a public real-world vulnerabilities dataset providing such association. In this paper, we present a method restoring this missing link by analyzing the vulnerabilities from the AOSP, an aggregate of more than 1,800 projects. It is the perfect target for building a representative dataset of vulnerabilities, as it covers the full spectrum that may be encountered in a modern system where a variety of low-level and higher-level components interact. More specifically, our main contribution is a dataset of more than 1,900 vulnerabilities, associating generic metadata (e.g. vulnerability type, impact level) with their respective patches at the commit granularity (e.g. fix commit-id, affected files, source code language). Finally, we also augment this dataset by providing precompiled binaries for a subset of the vulnerabilities. These binaries open various data usage, both for binary only analysis and at the interface between source and binary. In addition of providing a common baseline benchmark, our dataset release supports the community for data-driven software security research.

Original languageEnglish
Title of host publicationCODASPY 2022 - Proceedings of the 12th ACM Conference on Data and Application Security and Privacy
PublisherAssociation for Computing Machinery, Inc
Pages101-106
Number of pages6
ISBN (Electronic)9781450392204
DOIs
Publication statusPublished - 14 Apr 2022
Externally publishedYes
Event12th ACM Conference on Data and Application Security and Privacy, CODASPY 2022 - Virtual, Online, United States
Duration: 24 Apr 202227 Apr 2022

Publication series

NameCODASPY 2022 - Proceedings of the 12th ACM Conference on Data and Application Security and Privacy

Conference

Conference12th ACM Conference on Data and Application Security and Privacy, CODASPY 2022
Country/TerritoryUnited States
CityVirtual, Online
Period24/04/2227/04/22

Keywords

  • binary matching
  • dataset
  • patch detection
  • security vulnerabilities
  • vulnerability research

Fingerprint

Dive into the research topics of 'Building a Commit-level Dataset of Real-world Vulnerabilities'. Together they form a unique fingerprint.

Cite this