reproducibilityindex.ai

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Authors: Byeongho Heo, Sanghyuk Chun, Seong Joon Oh, Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Given the ubiquity of momentum GD and scale invariance in machine learning, we have evaluated our methods against the baselines on 13 benchmarks. They range from vision tasks like classiﬁcation (e.g. Image Net), retrieval (e.g. CUB and SOP), and detection (e.g. COCO) to language modelling (e.g. Wiki Text) and audio classiﬁcation (e.g. DCASE) tasks. We verify that our solution brings about uniform gains in performances in those benchmarks.
Researcher Affiliation	Collaboration	Naver AI Lab1, Naver Clova2 Applied Information Engineering, Yonsei University3
Pseudocode	Yes	The proposed method is readily adaptable to existing gradient-based optimization algorithms like SGD and Adam. Their modiﬁcations, SGDP and Adam P, are shown in Algorithms 1 and 2, respectively (Modiﬁcations are colorized).
Open Source Code	Yes	Source code is available at https://github.com/clovaai/adamp.
Open Datasets	Yes	Image Net1K benchmark (Russakovsky et al., 2015)... MS-COCO dataset (Lin et al., 2014)... CIFAR-10... Magna Tag ATune (MTAT) dataset (Law et al., 2009)... Speech Commands dataset (Warden, 2018)... DCASE 2017 challenge (Mesaros et al., 2017)... CUB (Wah et al., 2011), Cars-196 (Krause et al., 2013), In-Shop (Liu et al., 2016b), and SOP (Oh Song et al., 2016) benchmarks.
Dataset Splits	Yes	We have searched the best hyperparameters for the Adam optimizer on the MTAT validation dataset and have transferred them to Adam P experiments.
Hardware Specification	Yes	The training sessions are run for 100 epochs (Res Net18, Res Net50) or 150 epochs (Mobile Net V2, Res Net50 + Cut Mix) with the cosine learning rate schedule (Loshchilov & Hutter, 2016) on a machine with four NVIDIA V100 GPUs.
Software Dependencies	No	The paper states 'All experiments are conducted based on Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Experiments on Res Net (He et al., 2016) are conducted based on the standard settings : learning rate 0.1, weight decay 10 4, batch-size 256, momentum 0.9 with Nesterov (Sutskever et al., 2013) for SGD and SGDP. For Adam series, we use the learning rate 0.001, weight decay 10 4, batch-size 256, β1 0.9, β2 0.999, ϵ 10 8.