TY - GEN
T1 - DARKGAN
T2 - 22nd International Society for Music Information Retrieval Conference, ISMIR 2021
AU - Nistal, Javier
AU - Lattner, Stefan
AU - Richard, Gaël
N1 - Publisher Copyright:
© 2021 Proceedings of the 22nd International Conference on Music Information Retrieval, ISMIR 2021. All Rights Reserved.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domain. A way to circumvent this lack of annotations is to generate them, for example, with an automatic audio-tagging system. The output probabilities of such systems (so-called "soft labels") carry rich information about the characteristics of the respective audios and can be used to distill the knowledge from a teacher model into a student model. In this work, we perform knowledge distillation from a large audio tagging system into an adversarial audio synthesizer that we call DarkGAN. Results show that DarkGAN can synthesize musical audio with acceptable quality and exhibits moderate attribute control even with out-of-distribution input conditioning. We release the code and provide audio examples on the accompanying website.
AB - Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domain. A way to circumvent this lack of annotations is to generate them, for example, with an automatic audio-tagging system. The output probabilities of such systems (so-called "soft labels") carry rich information about the characteristics of the respective audios and can be used to distill the knowledge from a teacher model into a student model. In this work, we perform knowledge distillation from a large audio tagging system into an adversarial audio synthesizer that we call DarkGAN. Results show that DarkGAN can synthesize musical audio with acceptable quality and exhibits moderate attribute control even with out-of-distribution input conditioning. We release the code and provide audio examples on the accompanying website.
UR - https://www.scopus.com/pages/publications/85137790100
M3 - Conference contribution
AN - SCOPUS:85137790100
T3 - Proceedings of the 22nd International Conference on Music Information Retrieval, ISMIR 2021
SP - 484
EP - 492
BT - ISMIR 2021 - The International Society For Music Information Retrieval Conference, Proceedings
PB - International Society for Music Information Retrieval
Y2 - 7 November 2021 through 12 November 2021
ER -