Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

Authors: Chulhee Yun, Shashank Rajput, Suvrit Sra

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix C, we present numerical experiments that corroborate our theoretical findings.
Researcher Affiliation Academia Chulhee Yun KAIST AI chulhee.yun@kaist.ac.kr Shashank Rajput Univ. of Wisconsin-Madison CS rajput3@wisc.edu Suvrit Sra MIT EECS suvrit@mit.edu
Pseudocode Yes Algorithm 1 Local RR (with and without SYNCSHUF) ... Algorithm 2 Minibatch RR (with and without SYNCSHUF)
Open Source Code No The paper states, 'This paper is a theoretical work, without any experimental results.' (though Appendix C contradicts the experimental claim) and does not provide any link or explicit statement about the release of source code for the described methodology.
Open Datasets No The paper states it evaluates algorithms 'on the hard instance constructed in our lower bounds (Theorems 3 and 4)'. This refers to a synthetic function constructed for theoretical analysis, not a publicly available dataset with specific access information. It provides parameters like L, µ, ν, N, M rather than dataset names or links.
Dataset Splits No The paper describes experiments on a synthetic problem instance. It does not mention or define training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware used for running the numerical experiments in Appendix C. It only mentions parameters of the synthetic problem instance (e.g., 'L = 100, µ = 1, ν = 1, N = 768, and M = 16').
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., libraries, frameworks, or programming language versions) used for the numerical experiments.
Experiment Setup Yes We compare the performance of the algorithms on this problem instance, with L = 100, µ = 1, ν = 1, N = 768, and M = 16, while varying the choice of B {1, 4, 16, 64, 256} and K {1, 3, 5, 7, 10, 30, 50, 70, 100, 300, 500, 700, 1000}. For each value of B and K, we run the algorithms for K epochs (KN/B communication rounds for with-replacement algorithms) starting at x0 = 0 and return the values of F evaluated at the last iterates.