On Linear Stability of SGD and Input-Smoothness of Neural Networks
Authors: Chao Ma, Lexing Ying
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1 shows the results for a fully-connected network trained on Fashion MNIST dataset and a VGG-11 network trained on CIFAR10 dataset. |
| Researcher Affiliation | Academia | Chao Ma Department of Mathematics Stanford University Stanford, CA 94305 chaoma@stanford.edu Lexing Ying Department of Mathematics Stanford University Stanford, CA 94305 lexing@stanford.edu |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Section F and the URL therein. |
| Open Datasets | Yes | Figure 1 shows the results for a fully-connected network trained on Fashion MNIST dataset and a VGG-11 network trained on CIFAR10 dataset. |
| Dataset Splits | No | The paper mentions training and testing sets, but does not provide specific training/validation/test dataset splits (e.g., percentages or counts) or reference predefined splits with citations for reproducibility. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA 2080 Ti GPU. |
| Software Dependencies | Yes | All models are implemented in PyTorch 1.9.1 with CUDA 11.1, and trained with Python 3.8. |
| Experiment Setup | Yes | For Fashion MNIST, the fully connected network has 2 hidden layers with 1024 neurons each, and uses ReLU activation. It is trained for 50 epochs with a batch size of 100, and initial learning rate 0.1, which is decayed by 0.1 at epoch 20 and 40. For CIFAR10, we use VGG-11 [29] (with batch normalization) without data augmentation. It is trained for 150 epochs with a batch size of 128, and initial learning rate 0.1, which is decayed by 0.1 at epoch 75 and 125. The SGD optimizer is used for all experiments with momentum 0.9 and weight decay 5e-4. |