TY - GEN
T1 - Software Artifact Mining in Software Engineering Conferences
T2 - 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2022
AU - Khalil, Zeinab Abou
AU - Zacchiroli, Stefano
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/9/19
Y1 - 2022/9/19
N2 - Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004-2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.
AB - Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004-2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.
KW - Academic conferences
KW - Meta-analysis
KW - Mining software repository
KW - Research trends
KW - Software artifacts
KW - Systematic mapping
U2 - 10.1145/3544902.3546239
DO - 10.1145/3544902.3546239
M3 - Conference contribution
AN - SCOPUS:85139843255
T3 - International Symposium on Empirical Software Engineering and Measurement
SP - 227
EP - 237
BT - Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2022
A2 - Madeiral, Fernanda
A2 - Lassenius, Casper
A2 - Lassenius, Casper
A2 - Conte, Tayana
A2 - Mannisto, Tomi
PB - IEEE Computer Society
Y2 - 18 September 2022 through 23 September 2022
ER -