Closing the convergence gap of SGD without replacement
Authors: Shashank Rajput, Anant Gupta, Dimitris Papailiopoulos
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify our lower bound of Theorem 2, we ran SGDo on the function described in Eq. (5) with L = 4. The step size regimes that were considered were α = 1 T , 2 log T T , 4 log T T , 8 log T n. The plot for α = 4 log T T is shown in Figure 2. |
| Researcher Affiliation | Academia | 1University of Wisconsin Madison. Correspondence to: Shashank Rajput <rajput3@wisc.edu>. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for these experiments is available at https://github.com/shashankrajput/SGDo. |
| Open Datasets | No | The paper uses a custom function described in Eq. (5) for numerical verification and does not mention or provide access to a publicly available or open dataset. 'To verify our lower bound of Theorem 2, we ran SGDo on the function described in Eq. (5) with L = 4.' |
| Dataset Splits | No | The paper describes experiments on a constructed function by varying parameters (K, n) but does not specify training, validation, or test dataset splits in the conventional sense. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | The step size regimes that were considered were α = 1 T , 2 log T T , 4 log T T , 8 log T n. ... For each value of K, say K = 50, we set α = T = 4 log(n K) n K = 4 log(500 50) 500 50 and ran SGDo with this constant step size α on the sum of n = 500 functions for K = 50 epochs, and the final error was recorded. ... The optimization was initialized at the origin, that is x1 0 = 0. |