Robustness to Unbounded Smoothness of Generalized SignSGD
Authors: Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei Zhang, Zhenxun Zhuang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results are shown in Section 5, comparing our algorithm with some popular competitors in deep learning tasks. We conducted our experiments using PyTorch [41] on Nvidia V100 GPUs. |
| Researcher Affiliation | Collaboration | Michael Crawshaw George Mason University mcrawsha@gmu.edu Mingrui Liu George Mason University mingruil@gmu.edu Francesco Orabona Boston University francesco@orabona.com Wei Zhang IBM T. J. Watson Research Center weiz@us.ibm.com Zhenxun Zhuang Meta Platforms, Inc. oldboymls@gmail.com |
| Pseudocode | Yes | Algorithm 1 Generalized Sign SGD (All operations on vectors are element-wise.) |
| Open Source Code | Yes | Codes can be found at https://github.com/zhenxun-zhuang/Generalized-Sign SGD. |
| Open Datasets | Yes | We employ the 20-layer Residual Network model [18] to do image classification on the CIFAR-10 dataset. We adopt a 3-layer AWD-LSTM [35] to do language modeling on the Penn Treebank (PTB) dataset [33](word level). |
| Dataset Splits | Yes | We use grid-search to fine-tune the initial learning rate for all optimizers, as well as the clipping threshold for SGDClip Grad and SGDClip Momentum, and β2 for Adam and our algorithm, to select the one giving the best validation performance on a separated validation set. |
| Hardware Specification | Yes | We conducted our experiments using PyTorch [41] on Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'PyTorch [41]' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The mini-batch size is 128 and we train all algorithms for 164 epochs. We fixed the weight decay value to be 0.0001 and the momentum parameter (β1) to be 0.9. |