TY - GEN
T1 - Automatic Classification of Software Repositories
T2 - 29th International Conference on Evaluation and Assessment of Software Engineering, EASE 2025
AU - Balla, Stefano
AU - Degueule, Thomas
AU - Robbes, Romain
AU - Falleri, Jean Rémy
AU - Zacchiroli, Stefano
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/12/24
Y1 - 2025/12/24
N2 - The rapid growth of software repositories on development platforms such as GitHub, as well as archives like Software Heritage, prompts the need for better repository classification. Machine learning is increasingly used to automate this classification, but there are no secondary studies analyzing this research landscape.We present a systematic mapping study of 43 primary sources published between 2002 and 2023, where we examine the goals, inputs, outputs, training, and evaluation processes involved in automatic repository classification. Our findings reveal a growing interest in automatic classification, particularly to enhance the discoverability and recommendation of relevant repositories. Other applications, such as classification for mining studies, were surprisingly underrepresented. We also observe that a lack of standardized datasets, classification tasks, and evaluation metrics makes it difficult to compare the performance of different techniques.
AB - The rapid growth of software repositories on development platforms such as GitHub, as well as archives like Software Heritage, prompts the need for better repository classification. Machine learning is increasingly used to automate this classification, but there are no secondary studies analyzing this research landscape.We present a systematic mapping study of 43 primary sources published between 2002 and 2023, where we examine the goals, inputs, outputs, training, and evaluation processes involved in automatic repository classification. Our findings reveal a growing interest in automatic classification, particularly to enhance the discoverability and recommendation of relevant repositories. Other applications, such as classification for mining studies, were surprisingly underrepresented. We also observe that a lack of standardized datasets, classification tasks, and evaluation metrics makes it difficult to compare the performance of different techniques.
KW - repository classification
KW - software repositories
KW - systematic mapping study
UR - https://www.scopus.com/pages/publications/105027152587
U2 - 10.1145/3756681.3756958
DO - 10.1145/3756681.3756958
M3 - Conference contribution
AN - SCOPUS:105027152587
T3 - Proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering , EASE, 2025 edition, EASE 2025
SP - 102
EP - 113
BT - Proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering , EASE, 2025 edition, EASE 2025
A2 - Babar, Muhammad Ali
A2 - Tosun, Ayse
A2 - Wagner, Stefan
A2 - Stray, Viktoria
PB - Association for Computing Machinery, Inc
Y2 - 17 June 2025 through 20 June 2025
ER -