Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize significantly better than SGD and conventional adaptive gradient methods.
Researcher Affiliation Collaboration 1The University of Tokyo 2RIKEN Center for AIP 3Microsoft Research Asia.
Pseudocode Yes Algorithm 1 Adam; Algorithm 2 Adai; Algorithm 3 Adai S/Adai W
Open Source Code Yes Code: https://github.com/zeke-xie/adaptive-inertia-adai
Open Datasets Yes Datasets: CIFAR-10, CIFAR-100(Krizhevsky & Hinton, 2009), Image Net(Deng et al., 2009), and Penn Tree Bank(Marcus et al., 1993).
Dataset Splits No The paper does not explicitly provide percentages, sample counts, or citations for train/validation/test splits, though it refers to standard datasets and discusses 'Test performance comparison' which implies validation sets are used for hyperparameter tuning.
Hardware Specification Yes The experiments are conducted on a computing cluster with GPUs of NVIDIA R Tesla TM P100 16GB and CPUs of Intel R Xeon R CPU E5-2640 v3 @ 2.60GHz.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes Hyperparameter Settings for CIFAR-10 and CIFAR-100: We select the optimal learning rate for each experiment from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10}... The batch size is set to 128 for CIFAR-10 and CIFAR-100. The L2 regularization hyperparameter is set to λ = 0.0005 for CIFAR-10 and CIFAR-100.