Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize significantly better than SGD and conventional adaptive gradient methods.
Researcher Affiliation Collaboration 1The University of Tokyo 2RIKEN Center for AIP 3Microsoft Research Asia.
Pseudocode Yes Algorithm 1 Adam; Algorithm 2 Adai; Algorithm 3 Adai S/Adai W
Open Source Code Yes Code: https://github.com/zeke-xie/adaptive-inertia-adai
Open Datasets Yes Datasets: CIFAR-10, CIFAR-100(Krizhevsky & Hinton, 2009), Image Net(Deng et al., 2009), and Penn Tree Bank(Marcus et al., 1993).
Dataset Splits No The paper does not explicitly provide percentages, sample counts, or citations for train/validation/test splits, though it refers to standard datasets and discusses 'Test performance comparison' which implies validation sets are used for hyperparameter tuning.
Hardware Specification Yes The experiments are conducted on a computing cluster with GPUs of NVIDIA R Tesla TM P100 16GB and CPUs of Intel R Xeon R CPU E5-2640 v3 @ 2.60GHz.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes Hyperparameter Settings for CIFAR-10 and CIFAR-100: We select the optimal learning rate for each experiment from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10}... The batch size is set to 128 for CIFAR-10 and CIFAR-100. The L2 regularization hyperparameter is set to λ = 0.0005 for CIFAR-10 and CIFAR-100.