What can linearized neural networks actually say about generalization?
Authors: Guillermo Ortiz-Jimenez, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behavior of different neural networks and their linear approximations on different tasks. |
| Researcher Affiliation | Academia | Guillermo Ortiz-Jiménez EPFL, Lausanne, Switzerland guillermo.ortizjimenez@epfl.ch, Seyed-Mohsen Moosavi-Dezfooli ETH Zurich, Zurich, Switzerland seyed.moosavi@inf.ethz.ch, Pascal Frossard EPFL, Lausanne, Switzerland pascal.frossard@epfl.ch |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our code can be found at https://github.com/gortizji/linearized-networks. |
| Open Datasets | Yes | In particular, we generate a sequence of datasets constructed using the standard CIFAR10 [29] samples, which we label using different binarized versions of the NTK eigenfunctions. |
| Dataset Splits | No | The paper mentions 'Validation accuracy' (e.g., in Figure 2 caption), implying the use of a validation set, but it does not provide specific details on the dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets) in the main body of the paper. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models, processor types, or cloud instance types are provided in the main body of the paper. The paper checklist mentions 'See Appendix' for resource type, but this information is not in the main text. |
| Software Dependencies | No | The paper mentions using 'the neural_tangents library [27] built on top of the JAX framework [28]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Unless stated otherwise, we always use the same standard training procedure consisting of the use of stochastic gradient descent (SGD) to optimize a logistic loss, with a decaying learning rate starting at 0.05 and momentum set to 0.9. The values of our metrics are reported after 100 epochs of training. |