Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Authors: Zhiming Zhou*, Qingru Zhang*, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically study the proposed method and compare them with Adam, AMSGrad and SGD, on various tasks in terms of training performance and generalization. |
| Researcher Affiliation | Academia | Zhiming Zhou , Qingru Zhang , Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1 Ada Shift: Temporal Shifting with Block-wise Spatial Operation; Algorithm 2 Ada Shift: We use a first-in-first-out queue Q to denote the averaging window with the length of n. |
| Open Source Code | Yes | The anonymous code is provided at http://bit.ly/2NDXX6x. |
| Open Datasets | Yes | We further compare the proposed method with Adam, AMSGrad and SGD by using Logistic Regression and Multilayer Perceptron on MNIST... We test our algorithm with Res Net and Dense Net on CIFAR-10 datasets... We further increase the complexity of dataset, switching from CIFAR-10 to Tiny-Image Net. |
| Dataset Splits | No | The paper does not explicitly state the train/validation/test splits for the datasets used (e.g., MNIST, CIFAR-10, Tiny-ImageNet), although standard splits are often implied for these common benchmarks. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments. It mentions a 'Tensorflow implementation' in the context of the provided code, and general terms like 'ill-conditioned quadratic problem' that might be run on 'HPC Resource' but without specifics. |
| Software Dependencies | No | The paper mentions a 'Tensorflow implementation' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Here, we list all hyper-parameter setting of all above experiments. Table 5: Hyper-parameter setting of logistic regression in Figure 2. Optimizer learning rate β1 β2 n SGD 0.1 N/A N/A N/A Adam 0.001 0 0.999 N/A AMSGrad 0.001 0 0.999 N/A non-Ada Shift 0.001 0 0.999 1 max-Ada Shift 0.01 0 0.999 1 |