reproducibilityindex.ai

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize signiﬁcantly better than SGD and conventional adaptive gradient methods.
Researcher Affiliation	Collaboration	1The University of Tokyo 2RIKEN Center for AIP 3Microsoft Research Asia.
Pseudocode	Yes	Algorithm 1 Adam; Algorithm 2 Adai; Algorithm 3 Adai S/Adai W
Open Source Code	Yes	Code: https://github.com/zeke-xie/adaptive-inertia-adai
Open Datasets	Yes	Datasets: CIFAR-10, CIFAR-100(Krizhevsky & Hinton, 2009), Image Net(Deng et al., 2009), and Penn Tree Bank(Marcus et al., 1993).
Dataset Splits	No	The paper does not explicitly provide percentages, sample counts, or citations for train/validation/test splits, though it refers to standard datasets and discusses 'Test performance comparison' which implies validation sets are used for hyperparameter tuning.
Hardware Specification	Yes	The experiments are conducted on a computing cluster with GPUs of NVIDIA R Tesla TM P100 16GB and CPUs of Intel R Xeon R CPU E5-2640 v3 @ 2.60GHz.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	Hyperparameter Settings for CIFAR-10 and CIFAR-100: We select the optimal learning rate for each experiment from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10}... The batch size is set to 128 for CIFAR-10 and CIFAR-100. The L2 regularization hyperparameter is set to λ = 0.0005 for CIFAR-10 and CIFAR-100.