What can linearized neural networks actually say about generalization?

Authors: Guillermo Ortiz-Jimenez, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behavior of different neural networks and their linear approximations on different tasks.
Researcher Affiliation Academia Guillermo Ortiz-Jiménez EPFL, Lausanne, Switzerland guillermo.ortizjimenez@epfl.ch, Seyed-Mohsen Moosavi-Dezfooli ETH Zurich, Zurich, Switzerland seyed.moosavi@inf.ethz.ch, Pascal Frossard EPFL, Lausanne, Switzerland pascal.frossard@epfl.ch
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes Our code can be found at https://github.com/gortizji/linearized-networks.
Open Datasets Yes In particular, we generate a sequence of datasets constructed using the standard CIFAR10 [29] samples, which we label using different binarized versions of the NTK eigenfunctions.
Dataset Splits No The paper mentions 'Validation accuracy' (e.g., in Figure 2 caption), implying the use of a validation set, but it does not provide specific details on the dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets) in the main body of the paper.
Hardware Specification No No specific hardware details such as GPU/CPU models, processor types, or cloud instance types are provided in the main body of the paper. The paper checklist mentions 'See Appendix' for resource type, but this information is not in the main text.
Software Dependencies No The paper mentions using 'the neural_tangents library [27] built on top of the JAX framework [28]' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Unless stated otherwise, we always use the same standard training procedure consisting of the use of stochastic gradient descent (SGD) to optimize a logistic loss, with a decaying learning rate starting at 0.05 and momentum set to 0.9. The values of our metrics are reported after 100 epochs of training.