Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Noisy Gradient Descent that Generalizes as SGD
Authors: Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, Zhanxing Zhu
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present our empirical results. The setup details are explained in Supplementary Materials, Section C. The code is available at https://github.com/uuujf/Multi Noise. Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD. |
| Researcher Affiliation | Collaboration | 1Johns Hopkins University, Baltimore, MD, USA 2Missouri University of Science and Technology, Rolla, MO, USA 3Big Data Laboratory, Baidu Research, Beijing, China 4Styling.AI Inc., Beijing, China 5Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Multiplicative SGD and Algorithm 2 Mini-Batch Multiplicative SGD |
| Open Source Code | Yes | The code is available at https://github.com/uuujf/Multi Noise. |
| Open Datasets | Yes | Experiments In Figure 1 we test MSGD-Cov on various datasets and models. The results consistently suggest that the MSGD-Cov can generalize well as the vanilla SGD, though its noise belongs to a different distribution class. More interestingly, we observe that the MSGD-Cov converges faster than the vanilla SGD. (a) Small Fashion MNIST (b) Small SVHN (c) CIFAR-10 |
| Dataset Splits | No | The paper mentions training sets (e.g., “1,000 samples from Fashion MNIST as the training set”) and reports test accuracy, but does not provide specific details on validation sets or the precise splitting methodology (percentages, counts) for training, validation, and test sets in the main text. It defers setup details to supplementary materials. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for its experiments (e.g., GPU models, CPU types, or cloud computing specifications). |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper mentions the models and datasets used (e.g., “small convolutional network”, “VGG-11”, “ResNet-18” on “Fashion MNIST”, “SVHN”, “CIFAR-10”) and general training conditions (e.g., “without Batch Normalization”, “without using data augmentation and weight decay”), but it explicitly states that “The setup details are explained in Supplementary Materials, Section C.” and does not provide specific hyperparameter values or detailed experimental configurations in the main text. |