Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Authors: Leonardo Galli, Holger Rauhut, Mark Schmidt

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches.
Researcher Affiliation Academia Leonardo Galli, Holger Rauhut RWTH Aachen University Aachen {galli, rauhut}@mathc.rwth-aachen.de Mark Schmidt University of British Columbia Canada CIFAR AI Chair (Amii) schmidtm@cs.ubc.ca
Pseudocode No The paper describes the proposed methods and equations in the main text, but it does not include a distinct pseudocode block or algorithm listing.
Open Source Code No The paper does not provide any explicit statements about the release of its source code or links to a code repository.
Open Datasets Yes In particular, we focus on the datasets MNIST, Fashion MNIST, CIFAR10, CIFAR100 and SVHN, addressed with the architectures MLP [Luo et al., 219], Efficient Net-b1 [Tan and Le, 2019], Res Net-34 [He et al., 2016], Dense Net-121 [Huang et al., 2017] and Wide Res Net [Zagoruyko and Komodakis, 2016].
Dataset Splits No The paper mentions 'train loss' and 'test accuracy' and refers to standard datasets, but it does not explicitly provide details on how the datasets were split into training, validation, and test sets (e.g., percentages, specific split files, or citations to standard split methodologies for all datasets).
Hardware Specification No The paper does not provide specific details on the hardware used for experiments (e.g., CPU or GPU models, memory specifications, or cluster configurations).
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the implementation of the experiments (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup No The paper mentions that 'the learning rate of SGD and Adam has been chosen through a grid-search' and refers to 'implementation details (Section C)' in the supplementary materials for hyperparameter sensitivity, but it does not provide the specific hyperparameter values or detailed training configurations in the main text.