How much does Initialization Affect Generalization?

Authors: Sameera Ramasinghe, Lachlan Ewen Macdonald, Moshiur Farazi, Hemanth Saratchandran, Simon Lucey

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further, empirically test the developed theoretical insights using practical, deep networks. Finally, we contrast our framework with that supplied by the flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks.
Researcher Affiliation Collaboration 1Amazon, Australia 2Australian Institute of Machine Learning, University of Adelaide, Adelaide SA, Australia 3Machine Learning and Artificial Intelligence FSP, Data61-CSIRO.
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described through prose and mathematical equations.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for their methodology.
Open Datasets Yes CIFAR10 CIFAR100 Tiny Image Net... VGG11 (Simonyan & Zisserman, 2014)... Image Net.
Dataset Splits No The paper mentions "train splits of the datasets" and "test splits" but does not provide specific numerical details (percentages or sample counts) for how these splits were performed or explicitly mention a validation split for the main experiments in Table 1.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers that would be needed to reproduce the experiments.
Experiment Setup Yes We use SGD to optimize the networks with a learning rate of 1 10 4. The networks consist of 256 neurons in each hidden layer. All the networks are randomly initialized using Xavier initialization (Glorot & Bengio, 2010)... We use 4-layer networks where each layer s width is 256 neurons... We initialize the Re LU network using Xavier initialization and the Gaussian networks with N(0, 0.03).