Passer à la navigation principale Passer à la recherche Passer au contenu principal

Fast dictionary learning with a smoothed Wasserstein loss

  • Graduate School of Informatics
  • Université Paris Dauphine

Résultats de recherche: Contribution à une conférencePapierRevue par des pairs

Résumé

We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (a.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.

langue originaleAnglais
Pages630-638
Nombre de pages9
étatPublié - 1 janv. 2016
Modification externeOui
Evénement19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016 - Cadiz, Espagne
Durée: 9 mai 201611 mai 2016

Une conférence

Une conférence19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
Pays/TerritoireEspagne
La villeCadiz
période9/05/1611/05/16

Empreinte digitale

Examiner les sujets de recherche de « Fast dictionary learning with a smoothed Wasserstein loss ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation