Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape
Authors: Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montufar
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally confirm these results by finding a phase transition from most regions having full rank Jacobian to many regions having deficient rank depending on the amount of overparameterization. |
| Researcher Affiliation | Academia | Kedar Karhadkar EMAIL University of California, Los Angeles Michael Murray EMAIL University of California, Los Angeles Hanna Tseran EMAIL University of Tokyo Guido Montรบfar EMAIL University of California, Los Angeles Max Planck Institute for Mathematics in the Sciences |
| Pseudocode | No | The paper describes methodologies and theoretical findings but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The computer implementation of the scripts needed to reproduce our experiments can be found at https://github.com/kedar2/loss-landscape. |
| Open Datasets | Yes | We sample a dataset X Rd0 n whose entries are sampled iid Gaussian with mean 0 and variance 1. |
| Dataset Splits | No | The paper describes generating synthetic datasets and using the MNIST dataset, and mentions 'training set size n' and a 'classification task on MNIST', but it does not specify any training, testing, or validation splits for these datasets. |
| Hardware Specification | Yes | The experiments in Section G.1 were run on the CPU of a Mac Book Pro with an M2 chip and 8GB RAM. The experiments in Section G.2 were run on a CPU cluster that uses Intel Xeon Ice Lake-SP processors (Platinum 8360Y) with 72 cores per node and 256 GB RAM. |
| Software Dependencies | No | Experiments were implemented in Python using Py Torch (Paszke et al., 2019), numpy (Harris et al., 2020), and mpi4py (Dalcin et al., 2011). The plots were created using Matplotlib (Hunter, 2007). |
| Experiment Setup | Yes | We initialize our network with random weights and biases sampled iid uniformly on h 1 d1 , 1 d1 i . Weights and biases of the hidden units are sampled iid from the uniform distribution on the interval [ p 6/fan-in, p 6/fan-in] according to the uniform-He initialization (He et al., 2015). The weights of the output layer are initialized as alternating 1 and 1 and look like [1, 1, 1, 1, . . .]. |