Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

Authors: Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montufar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally confirm these results by finding a phase transition from most regions having full rank Jacobian to many regions having deficient rank depending on the amount of overparameterization.
Researcher Affiliation Academia Kedar Karhadkar EMAIL University of California, Los Angeles Michael Murray EMAIL University of California, Los Angeles Hanna Tseran EMAIL University of Tokyo Guido Montรบfar EMAIL University of California, Los Angeles Max Planck Institute for Mathematics in the Sciences
Pseudocode No The paper describes methodologies and theoretical findings but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The computer implementation of the scripts needed to reproduce our experiments can be found at https://github.com/kedar2/loss-landscape.
Open Datasets Yes We sample a dataset X Rd0 n whose entries are sampled iid Gaussian with mean 0 and variance 1.
Dataset Splits No The paper describes generating synthetic datasets and using the MNIST dataset, and mentions 'training set size n' and a 'classification task on MNIST', but it does not specify any training, testing, or validation splits for these datasets.
Hardware Specification Yes The experiments in Section G.1 were run on the CPU of a Mac Book Pro with an M2 chip and 8GB RAM. The experiments in Section G.2 were run on a CPU cluster that uses Intel Xeon Ice Lake-SP processors (Platinum 8360Y) with 72 cores per node and 256 GB RAM.
Software Dependencies No Experiments were implemented in Python using Py Torch (Paszke et al., 2019), numpy (Harris et al., 2020), and mpi4py (Dalcin et al., 2011). The plots were created using Matplotlib (Hunter, 2007).
Experiment Setup Yes We initialize our network with random weights and biases sampled iid uniformly on h 1 d1 , 1 d1 i . Weights and biases of the hidden units are sampled iid from the uniform distribution on the interval [ p 6/fan-in, p 6/fan-in] according to the uniform-He initialization (He et al., 2015). The weights of the output layer are initialized as alternating 1 and 1 and look like [1, 1, 1, 1, . . .].