Skip to main navigation Skip to search Skip to main content

The Software Heritage Open Science Ecosystem

  • Laboratoire de Probabilités et Modèles Aléatoires

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Software Heritage is the largest public archive of software source code and associated development history, as captured by modern version control systems. As of July 2023, it has archived more than 16 billion unique source code files coming from more than 250 million collaborative development projects. In this chapter, we describe the Software Heritage ecosystem, focusing on research and open science use cases. On the one hand, Software Heritage supports empirical research on software by materializing in a single Merkle direct acyclic graph the development history of public code. This giant graph of source code artifacts (files, directories, and commits) can be used -and has been used-to study repository forks, open source contributors, vulnerability propagation, software provenance tracking, source code indexing, and more. On the other hand, Software Heritage ensures availability and guarantees integrity of the source code of software artifacts used in any field that relies on software to conduct experiments, contributing to making research reproducible. The source code used in scientific experiments can be archived -e.g., via integration with open-access repositories - referenced using persistent identifiers that allow downstream integrity checks and linked to/from other scholarly digital artifacts.

Original languageEnglish
Title of host publicationSoftware Ecosystems
Subtitle of host publicationTooling and Analytics
PublisherSpringer International Publishing
Pages33-61
Number of pages29
ISBN (Electronic)9783031360602
ISBN (Print)9783031360596
DOIs
Publication statusPublished - 1 Jan 2023

Fingerprint

Dive into the research topics of 'The Software Heritage Open Science Ecosystem'. Together they form a unique fingerprint.

Cite this