On Convergence of Incremental Gradient for Non-convex Smooth Functions
Authors: Anastasia Koloskova, Nikita Doikov, Sebastian U Stich, Martin Jaggi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present illustrative numerical experiments comparing different strategies for selecting stochastic gradients: SGD (sampling gradients with replacement), Single Shuffle (SS, using one random permutation for all epochs), and Random Reshuffling (RR, generating permutation for each new epoch). We demonstrate that both of shuffle strategies are not only beneficial due to simpler and faster implementations, but also achieve comparable or even better convergence than plain SGD. |
| Researcher Affiliation | Academia | 1Machine Learning and Optimization Laboratory (MLO), EPFL, Lausanne, Switzerland 2CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. |
| Pseudocode | No | The paper refers to 'Algorithm (2)' as a mathematical equation for the update rule but does not present it in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Logistic regression on the Australian dataset (Chang & Lin, 2011), training the logistic regression model on machine learning datasets from (Chang & Lin, 2011), MNIST dataset, CIFAR dataset. |
| Dataset Splits | No | The paper does not explicitly state specific training, validation, and test dataset splits using percentages, absolute counts, or references to predefined splits. It mentions using training data and testing data for some experiments, but without details on the splits, and no explicit mention of a 'validation' set split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments. It only generally refers to the 'computational environment' in the Appendix. |
| Software Dependencies | No | The paper states 'The methods are implemented in Python 3.' but does not provide specific version numbers for Python libraries, frameworks (like PyTorch or TensorFlow), or other software dependencies. |
| Experiment Setup | Yes | We tune the stepsize over the fixed grid separately for each method, and for each n., We apply all the methods starting from x0 = 0 and using a constant stepsize γ > 0. We vary several values for γ, a three-layer neural network (one convolutional layer and two fully-connected layers with tanh activation functions) with the total number of parameters d = 140697., sampling batches of a fixed size 256. |