How much does Initialization Affect Generalization?
Authors: Sameera Ramasinghe, Lachlan Ewen Macdonald, Moshiur Farazi, Hemanth Saratchandran, Simon Lucey
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further, empirically test the developed theoretical insights using practical, deep networks. Finally, we contrast our framework with that supplied by the flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks. |
| Researcher Affiliation | Collaboration | 1Amazon, Australia 2Australian Institute of Machine Learning, University of Adelaide, Adelaide SA, Australia 3Machine Learning and Artificial Intelligence FSP, Data61-CSIRO. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described through prose and mathematical equations. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for their methodology. |
| Open Datasets | Yes | CIFAR10 CIFAR100 Tiny Image Net... VGG11 (Simonyan & Zisserman, 2014)... Image Net. |
| Dataset Splits | No | The paper mentions "train splits of the datasets" and "test splits" but does not provide specific numerical details (percentages or sample counts) for how these splits were performed or explicitly mention a validation split for the main experiments in Table 1. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | We use SGD to optimize the networks with a learning rate of 1 10 4. The networks consist of 256 neurons in each hidden layer. All the networks are randomly initialized using Xavier initialization (Glorot & Bengio, 2010)... We use 4-layer networks where each layer s width is 256 neurons... We initialize the Re LU network using Xavier initialization and the Gaussian networks with N(0, 0.03). |