Passer à la navigation principale Passer à la recherche Passer au contenu principal

The Debsources Dataset: two decades of free and open source software

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

We present the Debsources Dataset: source code and related metadata spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distribution. The dataset spans more than 3 billion lines of source code as well as metadata about them such as: size metrics (lines of code, disk usage), developer-defined symbols (ctags), file-level checksums (SHA1, SHA256, TLSH), file media types (MIME), release information (which version of which package containing which source code files has been released when), and license information (GPL, BSD, etc). The Debsources Dataset comes as a set of tarballs containing deduplicated unique source code files organized by their SHA1 checksums (the source code), plus a portable PostgreSQL database dump (the metadata). A case study is run to show how the Debsources Dataset can be used to easily and efficiently instrument very long-term analyses of the evolution of Debian from various angles (size, granularity, licensing, etc.), getting a grasp of major FOSS trends of the past two decades. The Debsources Dataset is Open Data, released under the terms of the CC BY-SA 4.0 license, and available for download from Zenodo with DOI reference 10.5281/zenodo.61089.

langue originaleAnglais
Pages (de - à)1405-1437
Nombre de pages33
journalEmpirical Software Engineering
Volume22
Numéro de publication3
Les DOIs
étatPublié - 1 juin 2017

Empreinte digitale

Examiner les sujets de recherche de « The Debsources Dataset: two decades of free and open source software ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation