Geodesic PCA versus LOG-PCA of histograms in the wasserstein space

  • Elsa Cazelles
  • , Vivien Seguy
  • , Jérémie Bigot
  • , Marco Cuturi
  • , Nicolas Papadakis

Research output: Contribution to journalArticlepeer-review

Abstract

This paper is concerned with the statistical analysis of datasets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the principal component analysis (PCA) of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose comparing the methods of log-PCA and geodesic PCA in the Wasserstein space as introduced in [J. Bigot et al., Ann. Inst. Henri Poincaré Probab. Stat., 53 (2017), pp. 1–26; V. Seguy and M. Cuturi, Principal geodesic analysis for probability measures under the optimal transport metric, in Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Inc., Red Hook, NY, 2015, pp. 3294–3302]. Geodesic PCA involves solving a nonconvex optimization problem. To solve it approximately, we propose a novel forward-backward algorithm. This allows us to give a detailed comparison between log-PCA and geodesic PCA of one-dimensional histograms, which we carry out using various datasets, and to stress the benefits and drawbacks of each method. We extend these results for two-dimensional data and compare both methods in that setting.

Original languageEnglish
Pages (from-to)B429-B456
JournalSIAM Journal on Scientific Computing
Volume40
Issue number2
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes

Keywords

  • Geodesic principal component analysis
  • Nonconvex optimization
  • Wasserstein space

Fingerprint

Dive into the research topics of 'Geodesic PCA versus LOG-PCA of histograms in the wasserstein space'. Together they form a unique fingerprint.

Cite this