Decentralized Riemannian Gradient Descent on the Stiefel Manifold

Authors: Shixiang Chen, Alfredo Garcia, Mingyi Hong, Shahin Shahrampour

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report the convergence results of DRSGD, DRDGD and DRGTA with different t and ˆβ on synthetic data. We fix m1 = . . . = mn = 1000, d = 100 and r = 5 and generate m1 n i.i.d samples following standard multivariate Gaussian distribution to obtain A. Let A = USV be the truncated SVD. Given an eigengap (0, 1), we modify the singular values of A to be a geometric sequence, i.e. Si,i = S0,0 i/2, i [d]. Typically, larger results in more difficult problem. In Figure 1, we show the results of DRSGD, DRDGD and DRGTA on the data with n = 32 and = 0.8. The y-axis is the log-scale distance. The first four lines in each testing case are for the ring graph, and the last one is on a complete graph with equally weighted matrix, which aims to show the case of t . In Figure 1(a), when fixing ˆβ, it is shown that that smaller ˆβ produces higher accuracy, which verifies Theorem 4.2. We see DRSGD performs almost the same with different t {1, 10, }. For the two deterministic algorithms DRDGD and DRGTA, we see that DRDGD can use larger ˆβ if more communication rounds t is used in Figure 1(b),(c). DRDGD cannot achieve exact convergence with constant stepsize, while DRGTA successfully solves the problem using t {10, }, ˆβ = 0.05. We provide some numerical results on the MNIST dataset (Le Cun).
Researcher Affiliation Academia 1The Wm Michael Barnes 64 Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843, USA. 2The Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
Pseudocode Yes Algorithm 1 Decentralized Riemannian Stochastic Gradient Descent (DRSGD) for Solving (1.1) Algorithm 2 Decentralized Riemannian Gradient Tracking over Stiefel manifold (DRGTA) for Solving (1.1)
Open Source Code Yes For reproducibility of results, our code is made available at https://github.com/ chenshixiang/Decentralized_Riemannian_ gradient_descent_on_Stiefel_manifold.
Open Datasets Yes We provide some numerical results on the MNIST dataset (Le Cun).
Dataset Splits No The paper refers to using datasets for 'epochs' and 'iterations' but does not explicitly describe train/validation/test splits, nor does it refer to specific predefined splits or cross-validation methods.
Hardware Specification Yes The experiments are evaluated in a HPC cluster, where each computation node is an Intel Xeon 6248R CPU. The computation nodes are connected by Mellanox HDR 100 Infini Band.
Software Dependencies No The codes are implemented in python with mpi4py (Dalc ın et al., 2005). While Python and mpi4py are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes For DRSGD, we set the maximum epoch to 200 and early stop it if ds( xk, x ) 10 5. For DRGTA and DRDGD, we set the maximum iteration number to 104 and the termination condition is ds( xk, x ) 10 8 or gradf( xk) F 10 8. We set βk = ˆβ 1 n Pn i=1 mi for DRGTA and DRDGD where ˆβ will be specified later. For DRSGD, we set β = ˆβ / 200. We fix α = 1 and generate the initial points uniformly randomly satisfying x1,0 = . . . = xn,0 M. We set the maximum epoch as 300 in all experiments. The stepsize is set to β = n 10000 / 300 ˆβ, where ˆβ is tuned for the best performance.