TY - GEN
T1 - The ultimate Debian database
T2 - 7th IEEE Working Conference on Mining Software Repositories, MSR 2010, Co-located with the 2010 ACM/IEEE International Conference on Software Engineering, ICSE 2010
AU - Nussbaum, Lucas
AU - Zacchiroli, Stefano
PY - 2010/6/25
Y1 - 2010/6/25
N2 - FLOSS distributions like RedHat and Ubuntu require a lot more complex infrastructures than most other FLOSS projects. In the case of community-driven distributions like Debian, the development of such an infrastructure is often not very organized, leading to new data sources being added in an impromptu manner while hackers set up new services that gain acceptance in the community. Mixing and matching data is then harder than should be, albeit being badly needed for Quality Assurance and data mining. Massive refactoring and integration is not a viable solution either, due to the constraints imposed by the bazaar development model. This paper presents the Ultimate Debian Database (UDD),1 which is the countermeasure adopted by the Debian project to the above "data hell". UDD gathers data from various data sources into a single, central SQL database, turning Quality Assurance needs that could not be easily implemented before into simple SQL queries. The paper also discusses the customs that have contributed to the data hell, the lessons learnt while designing UDD, and its applications and potentialities for data mining on FLOSS distributions.
AB - FLOSS distributions like RedHat and Ubuntu require a lot more complex infrastructures than most other FLOSS projects. In the case of community-driven distributions like Debian, the development of such an infrastructure is often not very organized, leading to new data sources being added in an impromptu manner while hackers set up new services that gain acceptance in the community. Mixing and matching data is then harder than should be, albeit being badly needed for Quality Assurance and data mining. Massive refactoring and integration is not a viable solution either, due to the constraints imposed by the bazaar development model. This paper presents the Ultimate Debian Database (UDD),1 which is the countermeasure adopted by the Debian project to the above "data hell". UDD gathers data from various data sources into a single, central SQL database, turning Quality Assurance needs that could not be easily implemented before into simple SQL queries. The paper also discusses the customs that have contributed to the data hell, the lessons learnt while designing UDD, and its applications and potentialities for data mining on FLOSS distributions.
KW - Data mining
KW - Data warehouse
KW - Distribution
KW - Open source
KW - Quality assurance
UR - https://www.scopus.com/pages/publications/77953804702
U2 - 10.1109/MSR.2010.5463277
DO - 10.1109/MSR.2010.5463277
M3 - Conference contribution
AN - SCOPUS:77953804702
SN - 9781424468034
T3 - Proceedings - International Conference on Software Engineering
SP - 52
EP - 61
BT - Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, MSR 2010, Co-located with ICSE 2010
Y2 - 2 May 2010 through 3 May 2010
ER -