Random Reshuffling: Simple Analysis with Vast Improvements
Authors: Konstantin Mishchenko, Ahmed Khaled, Peter Richtarik
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run our experiments on the ℓ2-regularized logistic regression problem given by... For better parallelism, we use minibatches of size 512 for all methods and datasets. We set λ = L/N and use stepsizes decreasing as O(1/t). See the appendix for more details on the parameters used, implementation details, and reproducibility. Observations. One notable property of all shuffling methods is that they converge with oscillations, as can be seen in Figure 1. |
| Researcher Affiliation | Academia | Konstantin Mishchenko KAUST Thuwal, Saudi Arabia Ahmed Khaled Cairo University Giza, Egypt Peter Richtárik KAUST Thuwal, Saudi Arabia |
| Pseudocode | Yes | Algorithm 1 Random Reshuffling (RR) Input: Stepsize γ > 0, initial vector x0 = x0 0 Rd, number of epochs T 1: for epochs t = 0, 1, . . . , T 1 do 2: Sample a permutation π0, π1, . . . , πn 1 of {1, 2, . . . , n} 3: for i = 0, 1, . . . , n 1 do 4: xi+1 t = xi t γ fπi(xi t) 5: xt+1 = xn t |
| Open Source Code | Yes | Reproducibility. Our code is provided at https://github.com/konstmish/random_reshuffling. |
| Open Datasets | Yes | Our code is provided at https://github.com/konstmish/random_reshuffling. All used datasets are publicly available and all additional implementation details are provided in the appendix. Figure 1: Top: real-sim dataset (N = 72, 309; d = 20, 958), middle row: w8a dataset (N = 49, 749; d = 300), bottom: RCV1 dataset (N = 804, 414; d = 47, 236). |
| Dataset Splits | No | The paper does not explicitly provide details about train/validation/test dataset splits in the main text. It mentions using minibatches but no specific splitting strategy for the datasets. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments in the provided text. |
| Software Dependencies | No | The paper mentions software in a general context (e.g., 'deep learning', 'training neural networks') but does not provide specific software names with version numbers for reproducible software dependencies (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x). |
| Experiment Setup | Yes | For better parallelism, we use minibatches of size 512 for all methods and datasets. We set λ = L/N and use stepsizes decreasing as O(1/t). See the appendix for more details on the parameters used, implementation details, and reproducibility. |