Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Authors: Atli Kosson, Bettina Messmer, Martin Jaggi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation. We experimentally validate that the results hold for NN training in practice. |
| Researcher Affiliation | Academia | 1EPFL, Switzerland. |
| Pseudocode | Yes | Algorithm 1 Rotational Wrapper for constrained dynamics. |
| Open Source Code | No | The paper uses and cites several open-source libraries (e.g., TIMM, Fair Seq, Nano GPT, LLM-Baselines) but does not explicitly state that the code for their described methodology is released or provide a link to it. |
| Open Datasets | Yes | We perform our experiments on several popular datasets, i.e., CIFAR-10/100 (Krizhevsky, 2009) and Imagenet-1k (Russakovsky et al., 2015) for image classification, IWSLT2014 (Cettolo et al., 2014) for German-English translation, and Wikitext (Merity et al., 2017) and Open Web Text (Radford et al., 2019) for language modelling. |
| Dataset Splits | Yes | For the sweep we train a Res Net-18 on a 90/10 train/val split from the original train set. We train on a random subset containing 90% of the train set and use the remaining 10% for validation which we report. |
| Hardware Specification | Yes | Most of the experiments are run on a single NVIDIA A100-SXM4-40GB GPU. |
| Software Dependencies | No | Our code utilizes the TIMM library (Wightman, 2019) for vision tasks, Fair Seq (Ott et al., 2019) for translation, and Nano GPT (Karpathy, 2023) and LLM-Baselines (Pagliardini, 2023) for language modelling. While these libraries are mentioned, specific version numbers for them or other core software dependencies are not provided. |
| Experiment Setup | Yes | Table 4: Experimental set up (include training set and test set definition) provides details on learning rate, warmup, epochs, schedule, precision, and specific weight decay and beta values for various optimizers and models. |