On Linear Stability of SGD and Input-Smoothness of Neural Networks

Authors: Chao Ma, Lexing Ying

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1 shows the results for a fully-connected network trained on Fashion MNIST dataset and a VGG-11 network trained on CIFAR10 dataset.
Researcher Affiliation Academia Chao Ma Department of Mathematics Stanford University Stanford, CA 94305 chaoma@stanford.edu Lexing Ying Department of Mathematics Stanford University Stanford, CA 94305 lexing@stanford.edu
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Section F and the URL therein.
Open Datasets Yes Figure 1 shows the results for a fully-connected network trained on Fashion MNIST dataset and a VGG-11 network trained on CIFAR10 dataset.
Dataset Splits No The paper mentions training and testing sets, but does not provide specific training/validation/test dataset splits (e.g., percentages or counts) or reference predefined splits with citations for reproducibility.
Hardware Specification Yes All experiments were run on a single NVIDIA 2080 Ti GPU.
Software Dependencies Yes All models are implemented in PyTorch 1.9.1 with CUDA 11.1, and trained with Python 3.8.
Experiment Setup Yes For Fashion MNIST, the fully connected network has 2 hidden layers with 1024 neurons each, and uses ReLU activation. It is trained for 50 epochs with a batch size of 100, and initial learning rate 0.1, which is decayed by 0.1 at epoch 20 and 40. For CIFAR10, we use VGG-11 [29] (with batch normalization) without data augmentation. It is trained for 150 epochs with a batch size of 128, and initial learning rate 0.1, which is decayed by 0.1 at epoch 75 and 125. The SGD optimizer is used for all experiments with momentum 0.9 and weight decay 5e-4.