Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
Authors: Chulhee Yun, Shashank Rajput, Suvrit Sra
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix C, we present numerical experiments that corroborate our theoretical findings. |
| Researcher Affiliation | Academia | Chulhee Yun KAIST AI chulhee.yun@kaist.ac.kr Shashank Rajput Univ. of Wisconsin-Madison CS rajput3@wisc.edu Suvrit Sra MIT EECS suvrit@mit.edu |
| Pseudocode | Yes | Algorithm 1 Local RR (with and without SYNCSHUF) ... Algorithm 2 Minibatch RR (with and without SYNCSHUF) |
| Open Source Code | No | The paper states, 'This paper is a theoretical work, without any experimental results.' (though Appendix C contradicts the experimental claim) and does not provide any link or explicit statement about the release of source code for the described methodology. |
| Open Datasets | No | The paper states it evaluates algorithms 'on the hard instance constructed in our lower bounds (Theorems 3 and 4)'. This refers to a synthetic function constructed for theoretical analysis, not a publicly available dataset with specific access information. It provides parameters like L, µ, ν, N, M rather than dataset names or links. |
| Dataset Splits | No | The paper describes experiments on a synthetic problem instance. It does not mention or define training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware used for running the numerical experiments in Appendix C. It only mentions parameters of the synthetic problem instance (e.g., 'L = 100, µ = 1, ν = 1, N = 768, and M = 16'). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) used for the numerical experiments. |
| Experiment Setup | Yes | We compare the performance of the algorithms on this problem instance, with L = 100, µ = 1, ν = 1, N = 768, and M = 16, while varying the choice of B {1, 4, 16, 64, 256} and K {1, 3, 5, 7, 10, 30, 50, 70, 100, 300, 500, 700, 1000}. For each value of B and K, we run the algorithms for K epochs (KN/B communication rounds for with-replacement algorithms) starting at x0 = 0 and return the values of F evaluated at the last iterates. |