reproducibilityindex.ai

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

Authors: Zhiming Zhou*, Qingru Zhang*, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically study the proposed method and compare them with Adam, AMSGrad and SGD, on various tasks in terms of training performance and generalization.
Researcher Affiliation	Academia	Zhiming Zhou , Qingru Zhang , Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu Shanghai Jiao Tong University
Pseudocode	Yes	Algorithm 1 Ada Shift: Temporal Shifting with Block-wise Spatial Operation; Algorithm 2 Ada Shift: We use a ﬁrst-in-ﬁrst-out queue Q to denote the averaging window with the length of n.
Open Source Code	Yes	The anonymous code is provided at http://bit.ly/2NDXX6x.
Open Datasets	Yes	We further compare the proposed method with Adam, AMSGrad and SGD by using Logistic Regression and Multilayer Perceptron on MNIST... We test our algorithm with Res Net and Dense Net on CIFAR-10 datasets... We further increase the complexity of dataset, switching from CIFAR-10 to Tiny-Image Net.
Dataset Splits	No	The paper does not explicitly state the train/validation/test splits for the datasets used (e.g., MNIST, CIFAR-10, Tiny-ImageNet), although standard splits are often implied for these common benchmarks.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments. It mentions a 'Tensorﬂow implementation' in the context of the provided code, and general terms like 'ill-conditioned quadratic problem' that might be run on 'HPC Resource' but without specifics.
Software Dependencies	No	The paper mentions a 'Tensorﬂow implementation' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	Here, we list all hyper-parameter setting of all above experiments. Table 5: Hyper-parameter setting of logistic regression in Figure 2. Optimizer learning rate β1 β2 n SGD 0.1 N/A N/A N/A Adam 0.001 0 0.999 N/A AMSGrad 0.001 0 0.999 N/A non-Ada Shift 0.001 0 0.999 1 max-Ada Shift 0.01 0 0.999 1