TY - GEN
T1 - A generic system for audio indexing
T2 - 10th International Conference on Digital Audio Effects, DAFx 2007
AU - Peeters, Geoffroy
PY - 2007/1/1
Y1 - 2007/1/1
N2 - In this paper we present a generic system for audio indexing (classification/segmentation) and apply it to two usual problems: speech/music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, "segment-statistical models" are used leading to results close to the state of the art.
AB - In this paper we present a generic system for audio indexing (classification/segmentation) and apply it to two usual problems: speech/music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, "segment-statistical models" are used leading to results close to the state of the art.
M3 - Conference contribution
AN - SCOPUS:84872729314
SN - 9788890147913
T3 - Proceedings of the International Conference on Digital Audio Effects, DAFx
SP - 205
EP - 212
BT - Proceedings of the 10th International Conference on Digital Audio Effects, DAFx 2007
Y2 - 10 September 2007 through 15 September 2007
ER -