SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Authors: Feihu Huang, Junyi Li, Heng Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms.
Researcher Affiliation Academia Feihu Huang, Junyi Li, Heng Huang Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA huangfeihu2018@gmail.com, junyili.ai@gmail.com, heng.huang@pitt.edu
Pseudocode Yes Algorithm 1 SUPER-ADAM Algorithm
Open Source Code Yes Code is available at https://github.com/LIJUNYI95/Super Adam
Open Datasets Yes image classification on CIFAR-10, CIFAR-100 and Image Net datasets and language modeling on Wiki-Text2 dataset.
Dataset Splits No The paper mentions 'whenever the validation error increases' but does not provide specific details about the dataset splits for training, validation, and testing needed for reproduction (e.g., percentages or sample counts).
Hardware Specification Yes All experiments are run over a machine with Intel Xeon E5-2683 CPU and 4 Nvidia Tesla P40 GPUs.
Software Dependencies No The paper mentions using models like ResNet-18, VGG-19, LSTM, and Transformer but does not specify software dependencies with version numbers (e.g., specific PyTorch or TensorFlow versions).
Experiment Setup Yes For all the optimizers, we set the batch size as 128 and trains for 200 epochs. For the learning rates and other hyper-parameters, we do grid search and report the best one for each optimizer. In Adam, Amsgrad and Ada Belief algorithms, we set the learning rate as 0.001.