reproducibilityindex.ai

Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

Authors: Juntang Zhuang, Yifan Ding, Tommy Tang, Nicha Dvornek, Sekhar C Tatikonda, James Duncan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classiﬁcation with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. [...] We conducted experiments on the MNIST dataset using a 2-layer MLP. We plot the average value of vt for uncentered-type and st for centered-type optimizers; as Fig. 6(a,b) shows, we observe st vt and the centered-type (ACProp, Ada Belief) converges faster, validating our analysis for early phases.
Researcher Affiliation	Academia	Juntang Zhuang1; Yifan Ding2; Tommy Tang3; Nicha Dvornek1; Sekhar Tatikonda1; James S. Duncan1 1 Yale University; 2 University of Central Florida; 3 University of Illinois at Urbana-Champaign
Pseudocode	Yes	Algorithm 1: Ada Belief Initialize x0, m0 0 , s0 0, t 0 While xt not converged t t + 1 gt xft(xt 1) mt β1mt 1 + (1 β1)gt st β2st 1+(1 β2)(gt mt)2 xt 1 α st+ϵmt ; Algorithm 2: ACProp Initialize x0, m0 0 , s0 0, t 0 While xt not converged t t + 1 gt xft(xt 1) mt β1mt 1 + (1 β1)gt xt Q xt 1 α st 1+ϵgt st β2st 1+(1 β2)(gt mt)2
Open Source Code	Yes	We provide the implementation at https://github.com/juntang-zhuang/ACProp-Optimizer.
Open Datasets	Yes	We conducted experiments on the MNIST dataset using a 2-layer MLP. [...] We ﬁrst conducted experiments on CIFAR10 image classiﬁcation task with a VGG-11 [31], Res Net34 [6] and Dense Net-121 [32]. [...] for Res Net18 on Image Net [...] We evaluated different optimizers on reinforcement learning with a deep Q-network (DQN) [21] on the four-rooms task [33]. [...] We evaluated the performance of ACProp on neural machine translation tasks with a transformer model [20]. [...] We conducted experiments with Deep Convolutional GAN (DCGAN) [35], Spectral-Norm GAN (SNGAN) [36], Self-Attention GAN (SAGAN) [37] and Relativistic-GAN (RLGAN) [38]. We set β1 = 0.5, and search for β2 and ϵ with the same schedule as previous section. We report the FID [39] on CIFAR10 dataset in Table. 4
Dataset Splits	No	The paper uses standard public datasets and mentions training, but does not provide explicit details on how the datasets were split into training, validation, and test sets (e.g., percentages, specific split files, or reference to a standard split name for validation).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as particular GPU or CPU models, memory specifications, or cloud instance types.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow) that would be needed for replication.
Experiment Setup	Yes	We performed extensive hyperparameter tuning in order to better compare the performance of different optimizers: for SGD we set the momentum as 0.9 which is the default for many cases [6, 32], and search the learning rate between 0.1 and 10 5 in the log-grid; for other adaptive optimizers, including Ada Belief, Adam, RAdam, Adam W and Ada Shift, we search the learning rate between 0.01 and 10 5 in the log-grid, and search ϵ between 10 5 and 10 10 in the log-grid. We use a weight decay of 5e-2 for Adam W, and use 5e-4 for other optimizers. [...] For all optimizers, we set learning rate as 0.0002, and search for β1 {0.9, 0.99, 0.999}, β2 {0.98, 0.99, 0.999} and ϵ {10 5, 10 6, ...10 16}.