Decentralized Riemannian Gradient Descent on the Stiefel Manifold
Authors: Shixiang Chen, Alfredo Garcia, Mingyi Hong, Shahin Shahrampour
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report the convergence results of DRSGD, DRDGD and DRGTA with different t and ˆβ on synthetic data. We fix m1 = . . . = mn = 1000, d = 100 and r = 5 and generate m1 n i.i.d samples following standard multivariate Gaussian distribution to obtain A. Let A = USV be the truncated SVD. Given an eigengap (0, 1), we modify the singular values of A to be a geometric sequence, i.e. Si,i = S0,0 i/2, i [d]. Typically, larger results in more difficult problem. In Figure 1, we show the results of DRSGD, DRDGD and DRGTA on the data with n = 32 and = 0.8. The y-axis is the log-scale distance. The first four lines in each testing case are for the ring graph, and the last one is on a complete graph with equally weighted matrix, which aims to show the case of t . In Figure 1(a), when fixing ˆβ, it is shown that that smaller ˆβ produces higher accuracy, which verifies Theorem 4.2. We see DRSGD performs almost the same with different t {1, 10, }. For the two deterministic algorithms DRDGD and DRGTA, we see that DRDGD can use larger ˆβ if more communication rounds t is used in Figure 1(b),(c). DRDGD cannot achieve exact convergence with constant stepsize, while DRGTA successfully solves the problem using t {10, }, ˆβ = 0.05. We provide some numerical results on the MNIST dataset (Le Cun). |
| Researcher Affiliation | Academia | 1The Wm Michael Barnes 64 Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843, USA. 2The Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA. |
| Pseudocode | Yes | Algorithm 1 Decentralized Riemannian Stochastic Gradient Descent (DRSGD) for Solving (1.1) Algorithm 2 Decentralized Riemannian Gradient Tracking over Stiefel manifold (DRGTA) for Solving (1.1) |
| Open Source Code | Yes | For reproducibility of results, our code is made available at https://github.com/ chenshixiang/Decentralized_Riemannian_ gradient_descent_on_Stiefel_manifold. |
| Open Datasets | Yes | We provide some numerical results on the MNIST dataset (Le Cun). |
| Dataset Splits | No | The paper refers to using datasets for 'epochs' and 'iterations' but does not explicitly describe train/validation/test splits, nor does it refer to specific predefined splits or cross-validation methods. |
| Hardware Specification | Yes | The experiments are evaluated in a HPC cluster, where each computation node is an Intel Xeon 6248R CPU. The computation nodes are connected by Mellanox HDR 100 Infini Band. |
| Software Dependencies | No | The codes are implemented in python with mpi4py (Dalc ın et al., 2005). While Python and mpi4py are mentioned, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | For DRSGD, we set the maximum epoch to 200 and early stop it if ds( xk, x ) 10 5. For DRGTA and DRDGD, we set the maximum iteration number to 104 and the termination condition is ds( xk, x ) 10 8 or gradf( xk) F 10 8. We set βk = ˆβ 1 n Pn i=1 mi for DRGTA and DRDGD where ˆβ will be specified later. For DRSGD, we set β = ˆβ / 200. We fix α = 1 and generate the initial points uniformly randomly satisfying x1,0 = . . . = xn,0 M. We set the maximum epoch as 300 in all experiments. The stepsize is set to β = n 10000 / 300 ˆβ, where ˆβ is tuned for the best performance. |