AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Authors: Zhiming Zhou*, Qingru Zhang*, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically study the proposed method and compare them with Adam, AMSGrad and SGD, on various tasks in terms of training performance and generalization. |
| Researcher Affiliation | Academia | Zhiming Zhou , Qingru Zhang , Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1 Ada Shift: Temporal Shifting with Block-wise Spatial Operation; Algorithm 2 Ada Shift: We use a first-in-first-out queue Q to denote the averaging window with the length of n. |
| Open Source Code | Yes | The anonymous code is provided at http://bit.ly/2NDXX6x. |
| Open Datasets | Yes | We further compare the proposed method with Adam, AMSGrad and SGD by using Logistic Regression and Multilayer Perceptron on MNIST... We test our algorithm with Res Net and Dense Net on CIFAR-10 datasets... We further increase the complexity of dataset, switching from CIFAR-10 to Tiny-Image Net. |
| Dataset Splits | No | The paper does not explicitly state the train/validation/test splits for the datasets used (e.g., MNIST, CIFAR-10, Tiny-ImageNet), although standard splits are often implied for these common benchmarks. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments. It mentions a 'Tensorflow implementation' in the context of the provided code, and general terms like 'ill-conditioned quadratic problem' that might be run on 'HPC Resource' but without specifics. |
| Software Dependencies | No | The paper mentions a 'Tensorflow implementation' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Here, we list all hyper-parameter setting of all above experiments. Table 5: Hyper-parameter setting of logistic regression in Figure 2. Optimizer learning rate β1 β2 n SGD 0.1 N/A N/A N/A Adam 0.001 0 0.999 N/A AMSGrad 0.001 0 0.999 N/A non-Ada Shift 0.001 0 0.999 1 max-Ada Shift 0.01 0 0.999 1 |