Abstract
This work introduces a new framework for nonnegative matrix factorization (NMF) in multisensor or multimodal data configurations, where taking into account the mutual dependence that exists between the related parallel streams of data is expected to improve performance. In contrast with previous works that focused on co-factorization methods-where some factors are shared by the different modalities-we propose a soft co-factorization scheme which accounts for possible local discrepancies across modalities or channels. This objective is formalized as an optimization problem where concurrent factorizations are jointly performed while being tied by a coupling term that penalizes differences between the related factor matrices associated with different modalities. We provide majorization-minimization (MM) algorithms for three common measures of fit-the squared Euclidean norm, the Kullback-Leibler divergence and the Itakura-Saito divergence-and two possible coupling variants, using either the ℓ 1 or the squared Euclidean norm of differences. The approach is shown to achieve promising performance in two audio-related tasks: multimodal speaker diarization using audiovisual data and audio source separation using stereo data.
| Original language | English |
|---|---|
| Article number | 6908018 |
| Pages (from-to) | 5940-5949 |
| Number of pages | 10 |
| Journal | IEEE Transactions on Signal Processing |
| Volume | 62 |
| Issue number | 22 |
| DOIs | |
| Publication status | Published - 15 Nov 2014 |
| Externally published | Yes |
Keywords
- Co-factorization
- multimodal data
- nonnegative matrix factorization
- segmentation
- source separation