Tighter Convergence Bounds for Shuffled SGD via Primal-Dual Perspective
Authors: Xufeng Cai, Cheuk Yin Lin, Jelena Diakonikolas
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we numerically demonstrate on common machine learning datasets that our bounds are indeed much tighter, thus offering a bridge between theory and practice. |
| Researcher Affiliation | Academia | Xufeng Cai Department of Computer Sciences University of Wisconsin-Madison xcai74@wisc.edu Cheuk Yin Lin Department of Computer Sciences University of Wisconsin-Madison cylin@cs.wisc.edu Jelena Diakonikolas Department of Computer Sciences University of Wisconsin-Madison jelena@cs.wisc.edu |
| Pseudocode | Yes | Algorithm 1 Shuffled SGD (Primal-Dual View, General Convex Smooth) |
| Open Source Code | No | The paper mentions implementing computations in Julia but does not provide concrete access (e.g., a link to a repository) to the source code for the methodology described. |
| Open Datasets | Yes | We also compare ˆL and Lmax on a number of benchmarking datasets from LIBSVM [15], MNIST [17], CIFAR10 [22], and Broad Bioimage Benchmark Collection [28]. |
| Dataset Splits | No | The paper mentions using standard machine learning datasets but does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper mentions that computations are implemented in Julia and processed under |
| Software Dependencies | No | The paper mentions using "Julia" and the "Julia Arpack Package" but does not specify version numbers for either, which is required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | To compare the effect of the step size η from prior work and our work, we choose take η = 1/(2n Lmax) based on [30], and η = 1/(n pˆL L) from our work, where ˆL, L are our novel fine-grained, data-dependent smoothness parameters defined in Section 3 for smooth convex finite-sum problems with linear predictors. |