reproducibilityindex.ai

Taxonomizing local versus global structure in neural network loss landscapes

Authors: Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data.
Researcher Affiliation	Academia	Yaoqing Yang1, Liam Hodgkinson1,2, Ryan Theisen1, Joe Zou1, Joseph E. Gonzalez1, Kannan Ramchandran1, Michael W. Mahoney1,2 1 University of California, Berkeley 2 International Computer Science Institute
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	In order that our results can be reproduced and extended, we have open-sourced our code.1 1https://github.com/nsfzyzz/loss_landscape_taxonomy
Open Datasets	Yes	We demonstrate these results on a range of computer vision and natural language processing benchmarks (CIFAR-10, CIFAR-100, SVHN, and IWSLT 2016 De-En)...
Dataset Splits	No	The paper mentions 'training/testing accuracy' and a 'test set' but does not specify validation splits or proportions for any of the datasets used.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for the experiments.
Software Dependencies	No	The paper mentions 'Py Hessian software [11]' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup	Yes	In the standard setting, batch size, learning rate, and weight decay are kept constant throughout training to study interactions between temperature-like parameters, load-like parameters, and the loss landscape. ... We will scale the network width to change the size of the network. For Res Net18 which contains four major blocks with channel width {k, 2k, 4k, 8k}, we select different values of k to obtain Res Net models with different widths.