On the Noisy Gradient Descent that Generalizes as SGD
Authors: Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, Zhanxing Zhu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present our empirical results. The setup details are explained in Supplementary Materials, Section C. The code is available at https://github.com/uuujf/Multi Noise. Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD. |
| Researcher Affiliation | Collaboration | 1Johns Hopkins University, Baltimore, MD, USA 2Missouri University of Science and Technology, Rolla, MO, USA 3Big Data Laboratory, Baidu Research, Beijing, China 4Styling.AI Inc., Beijing, China 5Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Multiplicative SGD and Algorithm 2 Mini-Batch Multiplicative SGD |
| Open Source Code | Yes | The code is available at https://github.com/uuujf/Multi Noise. |
| Open Datasets | Yes | Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD. (a) Small Fashion MNIST (b) Small SVHN (c) CIFAR-10 |
| Dataset Splits | No | The paper mentions training sets (e.g., “1,000 samples from Fashion MNIST as the training set”) and reports test accuracy, but does not provide specific details on validation sets or the precise splitting methodology (percentages, counts) for training, validation, and test sets in the main text. It defers setup details to supplementary materials. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for its experiments (e.g., GPU models, CPU types, or cloud computing specifications). |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper mentions the models and datasets used (e.g., “small convolutional network”, “VGG-11”, “ResNet-18” on “Fashion MNIST”, “SVHN”, “CIFAR-10”) and general training conditions (e.g., “without Batch Normalization”, “without using data augmentation and weight decay”), but it explicitly states that “The setup details are explained in Supplementary Materials, Section C.” and does not provide specific hyperparameter values or detailed experimental configurations in the main text. |