Visualizing the Loss Landscape of Neural Nets

Authors: Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. and We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and To understand the effects of network architecture on non-convexity, we trained a number of networks, and plotted the landscape around the obtained minimizers using the filter-normalized random direction method described in Section 4.
Researcher Affiliation Academia Hao Li1, Zheng Xu1, Gavin Taylor2, Christoph Studer3, Tom Goldstein1 1University of Maryland, College Park 2United States Naval Academy 3Cornell University
Pseudocode No No pseudocode or algorithm block is present in the paper.
Open Source Code Yes Code and plots are available at https://github.com/tomgoldstein/loss-landscape
Open Datasets Yes We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs.
Dataset Splits No The paper mentions training and testing data but does not explicitly provide details on validation dataset splits or usage.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided in the paper.
Software Dependencies No The paper does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow, specific solver versions).
Experiment Setup Yes We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. We use two batch sizes: a large batch size of 8192 (16.4% of the training data of CIFAR-10), and a small batch size of 128. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. The learning rate was initialized at 0.1, and decreased by a factor of 10 at epochs 150, 225 and 275. Deeper experimental VGG-like networks (e.g., Res Net-56-noshort, as described below) required a smaller initial learning rate of 0.01.