reproducibilityindex.ai

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

Authors: Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on CIFAR10/100 and Image Net testify its advantages.
Researcher Affiliation	Collaboration	Sea AI Lab, Singapore Nanjing University of Information Science & Technology, Nanjing, China {zhoupan, yanhanshu, fengjs, yansc}@sea.com xtyuan@nuist.edu.cn
Pseudocode	Yes	Algorithm 1: Lookahead Optimization Procedure (FS(θ), η, T, α, k, θ0, A, S) and Algorithm 2: Stagewise Locally-Regularized Look Ahead (SLRLA)
Open Source Code	Yes	Codes is available at https://github.com/sail-sg/SLRLA-optimizer.
Open Datasets	Yes	Experimental results on CIFAR10/100 and Image Net testify its advantages. Codes is available at https://github.com/sail-sg/SLRLA-optimizer. ... Here we investigate the effects of α on the performance of lookahead, stagewise lookahead [1] (SLA) and SLRLA on a regularized softmax problem with MNIST [56]. ... We evaluate SLA and SLRLA on CIFAR10/100 [58] and Image Net [59] using different network architectures...
Dataset Splits	Yes	Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please see the experimental settings in Sec. 6 and Appendix B.
Hardware Specification	Yes	We use two A100 GPUs on Image Net, and use single A100 GPU for all remaining experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	Following our theory, we use a linearly decayed learning rate (LR) for lookahead, and multi-step decayed LRs for SLA/SLRLA. See more details in Appendix B. ... For all experiments, SLRLA and SLA set k=5, a momentum of 0.9, and a multi-stage learning rate (LR) decay at the {0.3S, 0.6S, 0.8S}-th epoch with total epoch number S. On CIFAR10/100, we train 200 epochs with α=0.8, a weight decay of 10-3, and set LR decay rate as 0.2. On Imagenet, we run 100 epochs using α=0.5, a weight decay of 10-4 and an LR decay rate of 0.1. ... For regularization constant βq, SLRLA selects it from {0.02, 0.2, 2.0, 20} via cross validation, and ﬁnally sets it as 0.2 on CIFAR10/100 and 20 on Image Net.