TY - GEN
T1 - Cosmopolite Sound Monitoring (CoSMo)
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Angulo, Florian
AU - Essid, Slim
AU - Peeters, Geoffroy
AU - Mietlicki, Christophe
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Measuring noise in cities and automatically identifying the corresponding sound sources are a crucial challenge for policymakers. Indeed, such information helps addressing noise pollution and improving the well-being of urban dwellers. In recent years, researchers have provided annotated datasets recorded in two major cities to foster the development of urban sound event detection (SED) systems. This paper presents an in-depth study of the behaviour of state-of-the-art SED systems well suited to our problem, combining three far-field real recordings datasets which can be used jointly during training. In our evaluation, we highlight the performance gaps existing between simple and hard recording examples based on the salience of sound events and the polyphony of the recordings. We provide new proximity annotations for this analysis. We evaluate the ability of urban SED systems to generalize across cities with varying degrees of training supervision. We show that such generalization is hindered mostly by the difficulties current urban SED systems have to detect sound events with low salience along with sound events in highly polyphonic soundscapes.
AB - Measuring noise in cities and automatically identifying the corresponding sound sources are a crucial challenge for policymakers. Indeed, such information helps addressing noise pollution and improving the well-being of urban dwellers. In recent years, researchers have provided annotated datasets recorded in two major cities to foster the development of urban sound event detection (SED) systems. This paper presents an in-depth study of the behaviour of state-of-the-art SED systems well suited to our problem, combining three far-field real recordings datasets which can be used jointly during training. In our evaluation, we highlight the performance gaps existing between simple and hard recording examples based on the salience of sound events and the polyphony of the recordings. We provide new proximity annotations for this analysis. We evaluate the ability of urban SED systems to generalize across cities with varying degrees of training supervision. We show that such generalization is hindered mostly by the difficulties current urban SED systems have to detect sound events with low salience along with sound events in highly polyphonic soundscapes.
KW - Far-field urban audio recordings
KW - Sound Event Detection (SED)
KW - urban sound monitoring
U2 - 10.1109/ICASSP49357.2023.10095833
DO - 10.1109/ICASSP49357.2023.10095833
M3 - Conference contribution
AN - SCOPUS:85177591196
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 June 2023 through 10 June 2023
ER -