TY - GEN
T1 - Cover detection using dominant melody embeddings
AU - Doras, Guillaume
AU - Peeters, Geoffroy
N1 - Publisher Copyright:
© 2020 International Society for Music Information Retrieval. All rights reserved.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Automatic cover detection - the task of finding in an audio database all the covers of one or several query tracks- has long been seen as a challenging theoretical problem in the MIR community and as an acute practical problem for authors and composers societies. Original algorithms proposed for this task have proven their accuracy on small datasets, but are unable to scale up to modern real-life audio corpora. On the other hand, faster approaches designed to process thousands of pairwise comparisons resulted in lower accuracy, making them unsuitable for practical use. In this work, we propose a neural network architecture that is trained to represent each track as a single embedding vector. The computation burden is therefore left to the embedding extraction - that can be conducted offline and stored, while the pairwise comparison task reduces to a simple Euclidean distance computation. We further propose to extract each track's embedding out of its dominant melody representation, obtained by another neural network trained for this task. We then show that this architecture improves state-of-the-art accuracy both on small and large datasets, and is able to scale to query databases of thousands of tracks in a few seconds.
AB - Automatic cover detection - the task of finding in an audio database all the covers of one or several query tracks- has long been seen as a challenging theoretical problem in the MIR community and as an acute practical problem for authors and composers societies. Original algorithms proposed for this task have proven their accuracy on small datasets, but are unable to scale up to modern real-life audio corpora. On the other hand, faster approaches designed to process thousands of pairwise comparisons resulted in lower accuracy, making them unsuitable for practical use. In this work, we propose a neural network architecture that is trained to represent each track as a single embedding vector. The computation burden is therefore left to the embedding extraction - that can be conducted offline and stored, while the pairwise comparison task reduces to a simple Euclidean distance computation. We further propose to extract each track's embedding out of its dominant melody representation, obtained by another neural network trained for this task. We then show that this architecture improves state-of-the-art accuracy both on small and large datasets, and is able to scale to query databases of thousands of tracks in a few seconds.
M3 - Conference contribution
AN - SCOPUS:85087096926
T3 - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
SP - 107
EP - 114
BT - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
A2 - Flexer, Arthur
A2 - Peeters, Geoffroy
A2 - Urbano, Julian
A2 - Volk, Anja
PB - International Society for Music Information Retrieval
T2 - 20th International Society for Music Information Retrieval Conference, ISMIR 2019
Y2 - 4 November 2019 through 8 November 2019
ER -