Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adam-family Methods with Decoupled Weight Decay in Deep Learning

Authors: Kuangyu Ding, Nachuan Xiao, Kim-chuan Toh

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments demonstrate that Adam D outperforms Adam and is comparable to Adam W, in the aspects of both generalization performance and efficiency. [...] In this section, we conduct numerical experiments to demonstrate the effectiveness of Adam D in the context of image classification and language modeling tasks.
Researcher Affiliation	Academia	Kuangyu Ding EMAIL Edwardson School of Industrial Engineering Purdue University Nachuan Xiao EMAIL School of Data Science The Chinese University of Hong Kong, Shenzhen Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore
Pseudocode	Yes	Algorithm 1 Adam with decoupled weight decay (Adam D) for nonsmooth problem (UOP). [...] Algorithm 2 Adam W (Loshchilov & Hutter, 2019).
Open Source Code	No	The paper does not explicitly state that source code for their methodology is released, nor does it provide a direct link to a code repository. It only mentions the implementation environment: "All experiments are conducted using an NVIDIA RTX 3090 Ti GPU and are implemented in Python 3.9 with Py Torch 1.12.0."
Open Datasets	Yes	Our image classification experiments include the deployment of well-established architectures, namely Resnet34 (He et al., 2016) and Densenet121 (Huang et al., 2018), to train the CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). Our language modeling experiments focus on LSTM networks applied to the Penn Treebank dataset (Marcus et al., 1993).
Dataset Splits	Yes	In all our experiments on image classification, we train the models consistently for 200 epochs, employing a batch size of 128. At the 150th epoch, we reduce the step size by a factor of 0.1. [...] In all our language modeling experiments, we train our models for 200 epochs using a batch size of 128. We employ a step size reduction strategy that decreases the learning rate to 0.1 times its previous value twice during training, specifically at the 75th and 150th epochs.
Hardware Specification	Yes	All experiments are conducted using an NVIDIA RTX 3090 Ti GPU and are implemented in Python 3.9 with Py Torch 1.12.0.
Software Dependencies	Yes	All experiments are conducted using an NVIDIA RTX 3090 Ti GPU and are implemented in Python 3.9 with Py Torch 1.12.0.
Experiment Setup	Yes	In all our experiments on image classification, we train the models consistently for 200 epochs, employing a batch size of 128. At the 150th epoch, we reduce the step size by a factor of 0.1. [...] For the weight decay parameter, we consider values in σ {5×10−3, 10−3, 5×10−4, 10−4}. By fixing σ first, we ensure that all methods solve the same minimization problem. With σ fixed, we then perform a grid search over the learning rate η for Adam D, Adam, and Adam W using η {5×10−5, 10−4, 5×10−4, 10−3, 5×10−3, 10−2, 5×10−2, 10−1}. Other parameters are set as follows: Adam/Adam W: We set ε = 10−8, θk = 10−1 and β = 10−3 as the default setting in Pytorch. Adam D: We set θs = θ0 (log(s+2))−3/2, with s representing the epoch number. [...] Here, we set the initial momentum parameter to θ0 = 10−1, the second moment parameter to β = 10−3 and the regularization parameter to ε = 10−8, which are the same as the default settings in Py Torch for Adam/Adam W.