Wild SBOMs: A Large-scale Dataset of Software Bills of Materials from Public Code

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Developers gain productivity by reusing readily available Free and Open Source Software (FOSS) components. Such practices also bring some difficulties, such as managing licensing, components and related security. One approach to handle those difficulties is to use Software Bill of Materials (SBOMs). While there have been studies on the readiness of practitioners to embrace SBOMs and on the SBOM tools ecosystem, a large scale study on SBOM practices based on SBOM files produced in the wild is still lacking. A starting point for such a study is a large dataset of SBOM files found in the wild. We introduce such a dataset, consisting of over 78 thousand unique SBOM files, deduplicated from those found in over 94 million repositories. We include metadata that contains the standard and format used, quality score generated by the tool sbomqs, number of revisions, filenames and provenance information. Finally, we give suggestions and examples of research that could bring new insights on assessing and improving SBOM real practices.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories, MSR 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages164-168
Number of pages5
ISBN (Electronic)9798331501839
DOIs
Publication statusPublished - 1 Jan 2025
Event22nd IEEE/ACM International Conference on Mining Software Repositories, MSR 2025 - Ottawa, Canada
Duration: 27 Apr 202529 Apr 2025

Publication series

NameProceedings - 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories, MSR 2025

Conference

Conference22nd IEEE/ACM International Conference on Mining Software Repositories, MSR 2025
Country/TerritoryCanada
CityOttawa
Period27/04/2529/04/25

Keywords

  • SBOM dataset
  • SBOM scores
  • SBOM standards
  • SBOM usage in the wild

Fingerprint

Dive into the research topics of 'Wild SBOMs: A Large-scale Dataset of Software Bills of Materials from Public Code'. Together they form a unique fingerprint.

Cite this