reproducibilityindex.ai

Visualizing the Loss Landscape of Neural Nets

Authors: Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. and We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a ﬁxed number of epochs. and To understand the effects of network architecture on non-convexity, we trained a number of networks, and plotted the landscape around the obtained minimizers using the ﬁlter-normalized random direction method described in Section 4.
Researcher Affiliation	Academia	Hao Li1, Zheng Xu1, Gavin Taylor2, Christoph Studer3, Tom Goldstein1 1University of Maryland, College Park 2United States Naval Academy 3Cornell University
Pseudocode	No	No pseudocode or algorithm block is present in the paper.
Open Source Code	Yes	Code and plots are available at https://github.com/tomgoldstein/loss-landscape
Open Datasets	Yes	We train a CIFAR-10 classiﬁer using a 9-layer VGG network [34] with batch normalization for a ﬁxed number of epochs. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs.
Dataset Splits	No	The paper mentions training and testing data but does not explicitly provide details on validation dataset splits or usage.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided in the paper.
Software Dependencies	No	The paper does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow, specific solver versions).
Experiment Setup	Yes	We train a CIFAR-10 classiﬁer using a 9-layer VGG network [34] with batch normalization for a ﬁxed number of epochs. We use two batch sizes: a large batch size of 8192 (16.4% of the training data of CIFAR-10), and a small batch size of 128. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. The learning rate was initialized at 0.1, and decreased by a factor of 10 at epochs 150, 225 and 275. Deeper experimental VGG-like networks (e.g., Res Net-56-noshort, as described below) required a smaller initial learning rate of 0.01.