TY - GEN
T1 - A scalable audio fingerprint method with robustness to pitch-shifting
AU - Fenet, Sébastien
AU - Richard, Gäel
AU - Grenier, Yves
PY - 2011/1/1
Y1 - 2011/1/1
N2 - Audio fingerprint techniques should be robust to a variety of distortions due to noisy transmission channels or specific sound processing. Although most of nowadays techniques are robust to the majority of them, the quasi-systematic use of a spectral representation makes them possibly sensitive to pitch-shifting. This distortion indeed induces a modification of the spectral content of the signal. In this paper, we propose a novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting. Furthermore, we have associated this method with an efficient post-processing for the removal of false alarms. We also present the adaptation of a database pruning technique to our specific context. We have evaluated our approach on a real-life broadcast monitoring scenario. The analyzed data consisted of 120 hours of real radio broadcast (thus containing all the distortions that would be found in an industrial context). The reference database consisted of 30.000 songs. Our method, thanks to its increased robustness to pitch-shifting, shows an excellent detection score.
AB - Audio fingerprint techniques should be robust to a variety of distortions due to noisy transmission channels or specific sound processing. Although most of nowadays techniques are robust to the majority of them, the quasi-systematic use of a spectral representation makes them possibly sensitive to pitch-shifting. This distortion indeed induces a modification of the spectral content of the signal. In this paper, we propose a novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting. Furthermore, we have associated this method with an efficient post-processing for the removal of false alarms. We also present the adaptation of a database pruning technique to our specific context. We have evaluated our approach on a real-life broadcast monitoring scenario. The analyzed data consisted of 120 hours of real radio broadcast (thus containing all the distortions that would be found in an industrial context). The reference database consisted of 30.000 songs. Our method, thanks to its increased robustness to pitch-shifting, shows an excellent detection score.
UR - https://www.scopus.com/pages/publications/84873588095
M3 - Conference contribution
AN - SCOPUS:84873588095
SN - 9780615548654
T3 - Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011
SP - 121
EP - 126
BT - Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011
PB - International Society for Music Information Retrieval
T2 - 12th International Society for Music Information Retrieval Conference, ISMIR 2011
Y2 - 24 October 2011 through 28 October 2011
ER -