Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters. For multi-class classification using deep networks, SGD with Armijo line-search results in both faster convergence and better generalization. |
| Researcher Affiliation | Collaboration | Sharan Vaswani Mila, Université de Montréal; Aaron Mishkin University of British Columbia; Issam Laradji University of British Columbia; Mark Schmidt University of British Columbia, 1QBit; CCAI Affiliate Chair (Amii); Gauthier Gidel Mila, Université de Montréal; Simon Lacoste-Julien Mila, Université de Montréal |
| Pseudocode | Yes | Algorithm 1 SGD+Armijo(f, w0, max, b, c, β, γ, opt); Algorithm 2 reset( , max, γ, b, k, opt); Algorithm 3 in Appendix H gives pseudo-code for SGD with the Goldstein line-search. |
| Open Source Code | Yes | The code to reproduce our results can be found at https://github.com/Issam Laradji/sls. |
| Open Datasets | Yes | We experiment with four standard datasets: mushrooms, rcv1, ijcnn, and w8a from LIBSVM [14]. [...] For MNIST, we use a 1 hidden-layer multi-layer perceptron (MLP) of width 1000. For CIFAR10 and CIFAR100, we experiment with the standard image-classification architectures: Res Net-34 [28] and Dense Net-121 [29]. |
| Dataset Splits | No | The paper mentions using standard datasets like LIBSVM datasets, MNIST, CIFAR10, and CIFAR100, but it does not explicitly provide specific details on the train/validation/test splits (e.g., percentages or sample counts) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions general tasks like 'training deep networks' without specifying the underlying hardware. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). It mentions comparing against optimization methods like Adam, but does not provide specific software environment details or versions used for the implementation itself. |
| Experiment Setup | Yes | Appendix F gives additional details on our experimental setup and the default hyper-parameters used for the proposed line-search methods. |