reproducibilityindex.ai

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

Authors: Atli Kosson, Bettina Messmer, Martin Jaggi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation. We experimentally validate that the results hold for NN training in practice.
Researcher Affiliation	Academia	1EPFL, Switzerland.
Pseudocode	Yes	Algorithm 1 Rotational Wrapper for constrained dynamics.
Open Source Code	No	The paper uses and cites several open-source libraries (e.g., TIMM, Fair Seq, Nano GPT, LLM-Baselines) but does not explicitly state that the code for their described methodology is released or provide a link to it.
Open Datasets	Yes	We perform our experiments on several popular datasets, i.e., CIFAR-10/100 (Krizhevsky, 2009) and Imagenet-1k (Russakovsky et al., 2015) for image classification, IWSLT2014 (Cettolo et al., 2014) for German-English translation, and Wikitext (Merity et al., 2017) and Open Web Text (Radford et al., 2019) for language modelling.
Dataset Splits	Yes	For the sweep we train a Res Net-18 on a 90/10 train/val split from the original train set. We train on a random subset containing 90% of the train set and use the remaining 10% for validation which we report.
Hardware Specification	Yes	Most of the experiments are run on a single NVIDIA A100-SXM4-40GB GPU.
Software Dependencies	No	Our code utilizes the TIMM library (Wightman, 2019) for vision tasks, Fair Seq (Ott et al., 2019) for translation, and Nano GPT (Karpathy, 2023) and LLM-Baselines (Pagliardini, 2023) for language modelling. While these libraries are mentioned, specific version numbers for them or other core software dependencies are not provided.
Experiment Setup	Yes	Table 4: Experimental set up (include training set and test set definition) provides details on learning rate, warmup, epochs, schedule, precision, and specific weight decay and beta values for various optimizers and models.