Visualizing the Loss Landscape of Neural Nets
Authors: Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. and We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and To understand the effects of network architecture on non-convexity, we trained a number of networks, and plotted the landscape around the obtained minimizers using the filter-normalized random direction method described in Section 4. |
| Researcher Affiliation | Academia | Hao Li1, Zheng Xu1, Gavin Taylor2, Christoph Studer3, Tom Goldstein1 1University of Maryland, College Park 2United States Naval Academy 3Cornell University |
| Pseudocode | No | No pseudocode or algorithm block is present in the paper. |
| Open Source Code | Yes | Code and plots are available at https://github.com/tomgoldstein/loss-landscape |
| Open Datasets | Yes | We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly provide details on validation dataset splits or usage. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow, specific solver versions). |
| Experiment Setup | Yes | We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. We use two batch sizes: a large batch size of 8192 (16.4% of the training data of CIFAR-10), and a small batch size of 128. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. The learning rate was initialized at 0.1, and decreased by a factor of 10 at epochs 150, 225 and 275. Deeper experimental VGG-like networks (e.g., Res Net-56-noshort, as described below) required a smaller initial learning rate of 0.01. |