Taxonomizing local versus global structure in neural network loss landscapes

Authors: Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data.
Researcher Affiliation Academia Yaoqing Yang1, Liam Hodgkinson1,2, Ryan Theisen1, Joe Zou1, Joseph E. Gonzalez1, Kannan Ramchandran1, Michael W. Mahoney1,2 1 University of California, Berkeley 2 International Computer Science Institute
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes In order that our results can be reproduced and extended, we have open-sourced our code.1 1https://github.com/nsfzyzz/loss_landscape_taxonomy
Open Datasets Yes We demonstrate these results on a range of computer vision and natural language processing benchmarks (CIFAR-10, CIFAR-100, SVHN, and IWSLT 2016 De-En)...
Dataset Splits No The paper mentions 'training/testing accuracy' and a 'test set' but does not specify validation splits or proportions for any of the datasets used.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for the experiments.
Software Dependencies No The paper mentions 'Py Hessian software [11]' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes In the standard setting, batch size, learning rate, and weight decay are kept constant throughout training to study interactions between temperature-like parameters, load-like parameters, and the loss landscape. ... We will scale the network width to change the size of the network. For Res Net18 which contains four major blocks with channel width {k, 2k, 4k, 8k}, we select different values of k to obtain Res Net models with different widths.