reproducibilityindex.ai

What can linearized neural networks actually say about generalization?

Authors: Guillermo Ortiz-Jimenez, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behavior of different neural networks and their linear approximations on different tasks.
Researcher Affiliation	Academia	Guillermo Ortiz-Jiménez EPFL, Lausanne, Switzerland guillermo.ortizjimenez@epfl.ch, Seyed-Mohsen Moosavi-Dezfooli ETH Zurich, Zurich, Switzerland seyed.moosavi@inf.ethz.ch, Pascal Frossard EPFL, Lausanne, Switzerland pascal.frossard@epfl.ch
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	Our code can be found at https://github.com/gortizji/linearized-networks.
Open Datasets	Yes	In particular, we generate a sequence of datasets constructed using the standard CIFAR10 [29] samples, which we label using different binarized versions of the NTK eigenfunctions.
Dataset Splits	No	The paper mentions 'Validation accuracy' (e.g., in Figure 2 caption), implying the use of a validation set, but it does not provide specific details on the dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets) in the main body of the paper.
Hardware Specification	No	No specific hardware details such as GPU/CPU models, processor types, or cloud instance types are provided in the main body of the paper. The paper checklist mentions 'See Appendix' for resource type, but this information is not in the main text.
Software Dependencies	No	The paper mentions using 'the neural_tangents library [27] built on top of the JAX framework [28]' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Unless stated otherwise, we always use the same standard training procedure consisting of the use of stochastic gradient descent (SGD) to optimize a logistic loss, with a decaying learning rate starting at 0.05 and momentum set to 0.9. The values of our metrics are reported after 100 epochs of training.