Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models
Authors: Leonardo Galli, Holger Rauhut, Mark Schmidt
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. |
| Researcher Affiliation | Academia | Leonardo Galli, Holger Rauhut RWTH Aachen University Aachen {galli, rauhut}@mathc.rwth-aachen.de Mark Schmidt University of British Columbia Canada CIFAR AI Chair (Amii) schmidtm@cs.ubc.ca |
| Pseudocode | No | The paper describes the proposed methods and equations in the main text, but it does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any explicit statements about the release of its source code or links to a code repository. |
| Open Datasets | Yes | In particular, we focus on the datasets MNIST, Fashion MNIST, CIFAR10, CIFAR100 and SVHN, addressed with the architectures MLP [Luo et al., 219], Efficient Net-b1 [Tan and Le, 2019], Res Net-34 [He et al., 2016], Dense Net-121 [Huang et al., 2017] and Wide Res Net [Zagoruyko and Komodakis, 2016]. |
| Dataset Splits | No | The paper mentions 'train loss' and 'test accuracy' and refers to standard datasets, but it does not explicitly provide details on how the datasets were split into training, validation, and test sets (e.g., percentages, specific split files, or citations to standard split methodologies for all datasets). |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments (e.g., CPU or GPU models, memory specifications, or cluster configurations). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation of the experiments (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | No | The paper mentions that 'the learning rate of SGD and Adam has been chosen through a grid-search' and refers to 'implementation details (Section C)' in the supplementary materials for hyperparameter sensitivity, but it does not provide the specific hyperparameter values or detailed training configurations in the main text. |