SGD with shuffling: optimal rates without component convexity and large epoch requirements

Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RANDOMSHUFFLE (shuffle at the beginning of each epoch) and SINGLESHUFFLE (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is general enough to cover gradient dominated nonconvex costs, and does not rely on the convexity of individual component functions unlike existing optimal convergence results.
Researcher Affiliation Academia Kwangjun Ahn* Department of EECS MIT kjahn@mit.edu Chulhee Yun* Department of EECS MIT chulheey@mit.edu Suvrit Sra Department of EECS MIT suvrit@mit.edu
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that open-source code for the described methodology is available.
Open Datasets No This is a theoretical paper focused on mathematical analysis and proofs of convergence rates. It does not conduct empirical studies, use datasets, or perform training.
Dataset Splits No This is a theoretical paper focused on mathematical analysis and proofs of convergence rates. It does not involve dataset splits (training, validation, test) for empirical evaluation.
Hardware Specification No This is a theoretical paper that does not report on empirical experiments. Therefore, no hardware specifications are mentioned.
Software Dependencies No This is a theoretical paper that does not report on empirical experiments. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup No This is a theoretical paper focused on mathematical analysis. It does not include an experimental setup with hyperparameters or system-level training settings.