reproducibilityindex.ai

Understanding Decoupled and Early Weight Decay

Authors: Johan Bjorck, Kilian Q. Weinberger, Carla Gomes6777-6785

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that by applying WD only at the start, the network norm stays small throughout training. This has a regularizing effect as the effective gradient updates become larger. However, traditional generalizations metrics fail to capture this effect of WD, and we show how a simple scale-invariant metric can. We also show how the growth of network weights is heavily inﬂuenced by the dataset and its generalization properties. For decoupled WD, we perform experiments in NLP and RL where adaptive optimizers are the norm.
Researcher Affiliation	Academia	Johan Bjorck, Kilian Q. Weinberger, Carla P. Gomes Cornell University {njb225,kqw4,gomes}@cornell.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper uses publically available codebases like fairseq and dopamine, but does not state that the authors are releasing their own code for the methodology described.
Open Datasets	Yes	We replicate their experimental setup with identical hyperparameters (listed in the Appendix), training Resnet18 on Cifar10 and Resnet50 on Cifar100. We additionally provide experiments on tiny-imagenet [Karpathy, Li, and Johnson 2017 (accessed 2020-01-01] using densenet 121 [Huang et al. 2017]. We ﬁrst consider translation of the IWSLT 14 German to English dataset [Cettolo et al. 2014]... Secondly, we also consider the RL agent DQN [Mnih et al. 2015], using the publically available dopamine codebase [Castro et al. 2018] with their default hyperparameters (see the Appendix), trained on a handful of Atari games...
Dataset Splits	No	The paper mentions using standard datasets like Cifar10/100, tiny-imagenet, IWSLT 14, and Atari games, but does not explicitly state train/validation/test splits or cross-validation methodologies in the main text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper refers to using existing codebases like fairseq and dopamine, but does not list specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For investigating observations in Golatkar, Achille, and Soatto [2019] we replicate their experimental setup with identical hyperparameters (listed in the Appendix), training Resnet18 on Cifar10 and Resnet50 on Cifar100. and We consider λ {1e 3, 1e 4, 1e 5}, where the middle parameter is the default parameter used in fairseq, see the Appendix for all hyperparameters.