Robustness to Unbounded Smoothness of Generalized SignSGD

Authors: Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei Zhang, Zhenxun Zhuang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results are shown in Section 5, comparing our algorithm with some popular competitors in deep learning tasks. We conducted our experiments using PyTorch [41] on Nvidia V100 GPUs.
Researcher Affiliation Collaboration Michael Crawshaw George Mason University mcrawsha@gmu.edu Mingrui Liu George Mason University mingruil@gmu.edu Francesco Orabona Boston University francesco@orabona.com Wei Zhang IBM T. J. Watson Research Center weiz@us.ibm.com Zhenxun Zhuang Meta Platforms, Inc. oldboymls@gmail.com
Pseudocode Yes Algorithm 1 Generalized Sign SGD (All operations on vectors are element-wise.)
Open Source Code Yes Codes can be found at https://github.com/zhenxun-zhuang/Generalized-Sign SGD.
Open Datasets Yes We employ the 20-layer Residual Network model [18] to do image classification on the CIFAR-10 dataset. We adopt a 3-layer AWD-LSTM [35] to do language modeling on the Penn Treebank (PTB) dataset [33](word level).
Dataset Splits Yes We use grid-search to fine-tune the initial learning rate for all optimizers, as well as the clipping threshold for SGDClip Grad and SGDClip Momentum, and β2 for Adam and our algorithm, to select the one giving the best validation performance on a separated validation set.
Hardware Specification Yes We conducted our experiments using PyTorch [41] on Nvidia V100 GPUs.
Software Dependencies No The paper mentions using 'PyTorch [41]' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The mini-batch size is 128 and we train all algorithms for 164 epochs. We fixed the weight decay value to be 0.0001 and the momentum parameter (β1) to be 0.9.