Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

Authors: Jason Altschuler, Sinho Chewi, Patrik R Gerber, Austin Stromme

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. [...] In Section 3, we show that for the Wasserstein barycenter problem, Riemannian GD enjoys dimension-free convergence rates (Theorem 2). We make several comments to contextualize this result. First, our result eliminates the exponential dimension dependence of state-of-the-art convergence rates [Che+20], which aligns with the empirical performance of this algorithm (see Figure 1).
Researcher Affiliation Academia Jason M. Altschuler MIT jasonalt@mit.edu Sinho Chewi MIT schewi@mit.edu Patrik Gerber MIT prgerber@mit.edu Austin J. Stromme MIT astromme@mit.edu
Pseudocode Yes Algorithm 1 GD for Barycenters, Algorithm 2 SGD for Barycenters, Algorithm 3 GD for Regularized Barycenters, Algorithm 4 Smoothed GD for Median
Open Source Code Yes Code reproducing all experiments in this paper can be found at https://github.com/Patrik Gerber/Bures-Barycenters.
Open Datasets No The paper uses synthetically generated data based on specified parameters (e.g., 'n = 50 covariance matrices of dimension d = 50, each with condition number κ = 1000. Their eigenspaces are independent Haar distributed, and their eigenvalues are equally spaced in the interval [λmin, λmax] = [0.03, 30]'), but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits No The paper describes generating synthetic data and running experiments (e.g., 'We run 50 experiments and plot the average accuracy'), but it does not provide specific details on how these generated data are partitioned into training, validation, and test sets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications, or cloud instance types).
Software Dependencies No The paper mentions external solvers like SCS [ODo+16; ODo+19] and MOSEK [MOS21] for comparison, but it does not specify version numbers for the software dependencies used in their own implementation of the algorithms.
Experiment Setup Yes In Figure 2 we compare Riemannian and Euclidean GD on a dataset consisting of n = 50 covariance matrices of dimension d = 50, each with condition number κ = 1000. Their eigenspaces are independent Haar distributed, and their eigenvalues are equally spaced in the interval [λmin, λmax] = [0.03, 30]. [...] In our experiments, Riemannian SGD was competitive on a wide range of problems with η = 1/t. [...] In Figure 5 we plot the suboptimality gap for the unregularized objective F = R W2( , Σ) d P as we optimize Fε using Algorithm 4 for various ε. The covariance matrices are generated as in Figure 2, with n = d = 30 and [λmin, λmax] = [0.01, 10].