Taxonomizing local versus global structure in neural network loss landscapes
Authors: Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. |
| Researcher Affiliation | Academia | Yaoqing Yang1, Liam Hodgkinson1,2, Ryan Theisen1, Joe Zou1, Joseph E. Gonzalez1, Kannan Ramchandran1, Michael W. Mahoney1,2 1 University of California, Berkeley 2 International Computer Science Institute |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | In order that our results can be reproduced and extended, we have open-sourced our code.1 1https://github.com/nsfzyzz/loss_landscape_taxonomy |
| Open Datasets | Yes | We demonstrate these results on a range of computer vision and natural language processing benchmarks (CIFAR-10, CIFAR-100, SVHN, and IWSLT 2016 De-En)... |
| Dataset Splits | No | The paper mentions 'training/testing accuracy' and a 'test set' but does not specify validation splits or proportions for any of the datasets used. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for the experiments. |
| Software Dependencies | No | The paper mentions 'Py Hessian software [11]' but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | In the standard setting, batch size, learning rate, and weight decay are kept constant throughout training to study interactions between temperature-like parameters, load-like parameters, and the loss landscape. ... We will scale the network width to change the size of the network. For Res Net18 which contains four major blocks with channel width {k, 2k, 4k, 8k}, we select different values of k to obtain Res Net models with different widths. |