TY - JOUR
T1 - Geodesic PCA versus LOG-PCA of histograms in the wasserstein space
AU - Cazelles, Elsa
AU - Seguy, Vivien
AU - Bigot, Jérémie
AU - Cuturi, Marco
AU - Papadakis, Nicolas
N1 - Publisher Copyright:
© 2018 Society for Industrial and Applied Mathematics.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - This paper is concerned with the statistical analysis of datasets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the principal component analysis (PCA) of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose comparing the methods of log-PCA and geodesic PCA in the Wasserstein space as introduced in [J. Bigot et al., Ann. Inst. Henri Poincaré Probab. Stat., 53 (2017), pp. 1–26; V. Seguy and M. Cuturi, Principal geodesic analysis for probability measures under the optimal transport metric, in Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Inc., Red Hook, NY, 2015, pp. 3294–3302]. Geodesic PCA involves solving a nonconvex optimization problem. To solve it approximately, we propose a novel forward-backward algorithm. This allows us to give a detailed comparison between log-PCA and geodesic PCA of one-dimensional histograms, which we carry out using various datasets, and to stress the benefits and drawbacks of each method. We extend these results for two-dimensional data and compare both methods in that setting.
AB - This paper is concerned with the statistical analysis of datasets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the principal component analysis (PCA) of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose comparing the methods of log-PCA and geodesic PCA in the Wasserstein space as introduced in [J. Bigot et al., Ann. Inst. Henri Poincaré Probab. Stat., 53 (2017), pp. 1–26; V. Seguy and M. Cuturi, Principal geodesic analysis for probability measures under the optimal transport metric, in Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Inc., Red Hook, NY, 2015, pp. 3294–3302]. Geodesic PCA involves solving a nonconvex optimization problem. To solve it approximately, we propose a novel forward-backward algorithm. This allows us to give a detailed comparison between log-PCA and geodesic PCA of one-dimensional histograms, which we carry out using various datasets, and to stress the benefits and drawbacks of each method. We extend these results for two-dimensional data and compare both methods in that setting.
KW - Geodesic principal component analysis
KW - Nonconvex optimization
KW - Wasserstein space
U2 - 10.1137/17M1143459
DO - 10.1137/17M1143459
M3 - Article
AN - SCOPUS:85046805671
SN - 1064-8275
VL - 40
SP - B429-B456
JO - SIAM Journal on Scientific Computing
JF - SIAM Journal on Scientific Computing
IS - 2
ER -