reproducibilityindex.ai

How much does Initialization Affect Generalization?

Authors: Sameera Ramasinghe, Lachlan Ewen Macdonald, Moshiur Farazi, Hemanth Saratchandran, Simon Lucey

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further, empirically test the developed theoretical insights using practical, deep networks. Finally, we contrast our framework with that supplied by the flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks.
Researcher Affiliation	Collaboration	1Amazon, Australia 2Australian Institute of Machine Learning, University of Adelaide, Adelaide SA, Australia 3Machine Learning and Artificial Intelligence FSP, Data61-CSIRO.
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described through prose and mathematical equations.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for their methodology.
Open Datasets	Yes	CIFAR10 CIFAR100 Tiny Image Net... VGG11 (Simonyan & Zisserman, 2014)... Image Net.
Dataset Splits	No	The paper mentions "train splits of the datasets" and "test splits" but does not provide specific numerical details (percentages or sample counts) for how these splits were performed or explicitly mention a validation split for the main experiments in Table 1.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers that would be needed to reproduce the experiments.
Experiment Setup	Yes	We use SGD to optimize the networks with a learning rate of 1 10 4. The networks consist of 256 neurons in each hidden layer. All the networks are randomly initialized using Xavier initialization (Glorot & Bengio, 2010)... We use 4-layer networks where each layer s width is 256 neurons... We initialize the Re LU network using Xavier initialization and the Gaussian networks with N(0, 0.03).