Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
Authors: Jason Altschuler, Sinho Chewi, Patrik R Gerber, Austin Stromme
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. [...] In Section 3, we show that for the Wasserstein barycenter problem, Riemannian GD enjoys dimension-free convergence rates (Theorem 2). We make several comments to contextualize this result. First, our result eliminates the exponential dimension dependence of state-of-the-art convergence rates [Che+20], which aligns with the empirical performance of this algorithm (see Figure 1). |
| Researcher Affiliation | Academia | Jason M. Altschuler MIT jasonalt@mit.edu Sinho Chewi MIT schewi@mit.edu Patrik Gerber MIT prgerber@mit.edu Austin J. Stromme MIT astromme@mit.edu |
| Pseudocode | Yes | Algorithm 1 GD for Barycenters, Algorithm 2 SGD for Barycenters, Algorithm 3 GD for Regularized Barycenters, Algorithm 4 Smoothed GD for Median |
| Open Source Code | Yes | Code reproducing all experiments in this paper can be found at https://github.com/Patrik Gerber/Bures-Barycenters. |
| Open Datasets | No | The paper uses synthetically generated data based on specified parameters (e.g., 'n = 50 covariance matrices of dimension d = 50, each with condition number κ = 1000. Their eigenspaces are independent Haar distributed, and their eigenvalues are equally spaced in the interval [λmin, λmax] = [0.03, 30]'), but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes generating synthetic data and running experiments (e.g., 'We run 50 experiments and plot the average accuracy'), but it does not provide specific details on how these generated data are partitioned into training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications, or cloud instance types). |
| Software Dependencies | No | The paper mentions external solvers like SCS [ODo+16; ODo+19] and MOSEK [MOS21] for comparison, but it does not specify version numbers for the software dependencies used in their own implementation of the algorithms. |
| Experiment Setup | Yes | In Figure 2 we compare Riemannian and Euclidean GD on a dataset consisting of n = 50 covariance matrices of dimension d = 50, each with condition number κ = 1000. Their eigenspaces are independent Haar distributed, and their eigenvalues are equally spaced in the interval [λmin, λmax] = [0.03, 30]. [...] In our experiments, Riemannian SGD was competitive on a wide range of problems with η = 1/t. [...] In Figure 5 we plot the suboptimality gap for the unregularized objective F = R W2( , Σ) d P as we optimize Fε using Algorithm 4 for various ε. The covariance matrices are generated as in Figure 2, with n = d = 30 and [λmin, λmax] = [0.01, 10]. |