SGD with shuffling: optimal rates without component convexity and large epoch requirements
Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RANDOMSHUFFLE (shuffle at the beginning of each epoch) and SINGLESHUFFLE (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is general enough to cover gradient dominated nonconvex costs, and does not rely on the convexity of individual component functions unlike existing optimal convergence results. |
| Researcher Affiliation | Academia | Kwangjun Ahn* Department of EECS MIT kjahn@mit.edu Chulhee Yun* Department of EECS MIT chulheey@mit.edu Suvrit Sra Department of EECS MIT suvrit@mit.edu |
| Pseudocode | No | The paper describes algorithms but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that open-source code for the described methodology is available. |
| Open Datasets | No | This is a theoretical paper focused on mathematical analysis and proofs of convergence rates. It does not conduct empirical studies, use datasets, or perform training. |
| Dataset Splits | No | This is a theoretical paper focused on mathematical analysis and proofs of convergence rates. It does not involve dataset splits (training, validation, test) for empirical evaluation. |
| Hardware Specification | No | This is a theoretical paper that does not report on empirical experiments. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | This is a theoretical paper that does not report on empirical experiments. Therefore, no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | This is a theoretical paper focused on mathematical analysis. It does not include an experimental setup with hyperparameters or system-level training settings. |