reproducibilityindex.ai

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare the proposed algorithms against numerous optimization methods on standard classiﬁcation tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters. For multi-class classiﬁcation using deep networks, SGD with Armijo line-search results in both faster convergence and better generalization.
Researcher Affiliation	Collaboration	Sharan Vaswani Mila, Université de Montréal; Aaron Mishkin University of British Columbia; Issam Laradji University of British Columbia; Mark Schmidt University of British Columbia, 1QBit; CCAI Afﬁliate Chair (Amii); Gauthier Gidel Mila, Université de Montréal; Simon Lacoste-Julien Mila, Université de Montréal
Pseudocode	Yes	Algorithm 1 SGD+Armijo(f, w0, max, b, c, β, γ, opt); Algorithm 2 reset( , max, γ, b, k, opt); Algorithm 3 in Appendix H gives pseudo-code for SGD with the Goldstein line-search.
Open Source Code	Yes	The code to reproduce our results can be found at https://github.com/Issam Laradji/sls.
Open Datasets	Yes	We experiment with four standard datasets: mushrooms, rcv1, ijcnn, and w8a from LIBSVM [14]. [...] For MNIST, we use a 1 hidden-layer multi-layer perceptron (MLP) of width 1000. For CIFAR10 and CIFAR100, we experiment with the standard image-classiﬁcation architectures: Res Net-34 [28] and Dense Net-121 [29].
Dataset Splits	No	The paper mentions using standard datasets like LIBSVM datasets, MNIST, CIFAR10, and CIFAR100, but it does not explicitly provide specific details on the train/validation/test splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions general tasks like 'training deep networks' without specifying the underlying hardware.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). It mentions comparing against optimization methods like Adam, but does not provide specific software environment details or versions used for the implementation itself.
Experiment Setup	Yes	Appendix F gives additional details on our experimental setup and the default hyper-parameters used for the proposed line-search methods.