Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Visualizing the Loss Landscape of Neural Nets
Authors: Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. and We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and To understand the effects of network architecture on non-convexity, we trained a number of networks, and plotted the landscape around the obtained minimizers using the filter-normalized random direction method described in Section 4. |
| Researcher Affiliation | Academia | Hao Li1, Zheng Xu1, Gavin Taylor2, Christoph Studer3, Tom Goldstein1 1University of Maryland, College Park 2United States Naval Academy 3Cornell University |
| Pseudocode | No | No pseudocode or algorithm block is present in the paper. |
| Open Source Code | Yes | Code and plots are available at https://github.com/tomgoldstein/loss-landscape |
| Open Datasets | Yes | We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly provide details on validation dataset splits or usage. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow, specific solver versions). |
| Experiment Setup | Yes | We train a CIFAR-10 classifier using a 9-layer VGG network [34] with batch normalization for a fixed number of epochs. We use two batch sizes: a large batch size of 8192 (16.4% of the training data of CIFAR-10), and a small batch size of 128. and All models are trained on the CIFAR-10 dataset using SGD with Nesterov momentum, batch-size 128, and 0.0005 weight decay for 300 epochs. The learning rate was initialized at 0.1, and decreased by a factor of 10 at epochs 150, 225 and 275. Deeper experimental VGG-like networks (e.g., Res Net-56-noshort, as described below) required a smaller initial learning rate of 0.01. |