On the Noisy Gradient Descent that Generalizes as SGD

Authors: Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, Zhanxing Zhu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present our empirical results. The setup details are explained in Supplementary Materials, Section C. The code is available at https://github.com/uuujf/Multi Noise. Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD.
Researcher Affiliation Collaboration 1Johns Hopkins University, Baltimore, MD, USA 2Missouri University of Science and Technology, Rolla, MO, USA 3Big Data Laboratory, Baidu Research, Beijing, China 4Styling.AI Inc., Beijing, China 5Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 Multiplicative SGD and Algorithm 2 Mini-Batch Multiplicative SGD
Open Source Code Yes The code is available at https://github.com/uuujf/Multi Noise.
Open Datasets Yes Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD. (a) Small Fashion MNIST (b) Small SVHN (c) CIFAR-10
Dataset Splits No The paper mentions training sets (e.g., “1,000 samples from Fashion MNIST as the training set”) and reports test accuracy, but does not provide specific details on validation sets or the precise splitting methodology (percentages, counts) for training, validation, and test sets in the main text. It defers setup details to supplementary materials.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for its experiments (e.g., GPU models, CPU types, or cloud computing specifications).
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup No The paper mentions the models and datasets used (e.g., “small convolutional network”, “VGG-11”, “ResNet-18” on “Fashion MNIST”, “SVHN”, “CIFAR-10”) and general training conditions (e.g., “without Batch Normalization”, “without using data augmentation and weight decay”), but it explicitly states that “The setup details are explained in Supplementary Materials, Section C.” and does not provide specific hyperparameter values or detailed experimental configurations in the main text.