Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum
Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize significantly better than SGD and conventional adaptive gradient methods. |
| Researcher Affiliation | Collaboration | 1The University of Tokyo 2RIKEN Center for AIP 3Microsoft Research Asia. |
| Pseudocode | Yes | Algorithm 1 Adam; Algorithm 2 Adai; Algorithm 3 Adai S/Adai W |
| Open Source Code | Yes | Code: https://github.com/zeke-xie/adaptive-inertia-adai |
| Open Datasets | Yes | Datasets: CIFAR-10, CIFAR-100(Krizhevsky & Hinton, 2009), Image Net(Deng et al., 2009), and Penn Tree Bank(Marcus et al., 1993). |
| Dataset Splits | No | The paper does not explicitly provide percentages, sample counts, or citations for train/validation/test splits, though it refers to standard datasets and discusses 'Test performance comparison' which implies validation sets are used for hyperparameter tuning. |
| Hardware Specification | Yes | The experiments are conducted on a computing cluster with GPUs of NVIDIA R Tesla TM P100 16GB and CPUs of Intel R Xeon R CPU E5-2640 v3 @ 2.60GHz. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Hyperparameter Settings for CIFAR-10 and CIFAR-100: We select the optimal learning rate for each experiment from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10}... The batch size is set to 128 for CIFAR-10 and CIFAR-100. The L2 regularization hyperparameter is set to λ = 0.0005 for CIFAR-10 and CIFAR-100. |