Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Authors: Julia Nakhleh, Robert D. Nowak
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform several simple experiments on synthetic data which suggest that our proposed ℓp path norm lends itself to practical application, recovering far sparser solutions more quickly than unregularized or weight decay-regularized gradient-based training. |
| Researcher Affiliation | Academia | Julia Nakhleh Department of Computer Science University of Wisconsin-Madison Madison, WI EMAIL Robert D. Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI EMAIL |
| Pseudocode | Yes | The full algorithm is summarized in Algorithm 1. |
| Open Source Code | Yes | Code for these experiments is available at https://github.com/julianakhleh/sparse_nns_lp. |
| Open Datasets | No | We perform several simple experiments on synthetic data which suggest that our proposed ℓp path norm lends itself to practical application... The first is a univariate peak/plateau dataset, which consists of the data/label pairs: ( 2, 0), ( 1, 0), (0, 1), (1, 1), (2, 0), (3, 0)... For our second experiment, we consider N = 10 data points in d = 50 dimensions. The coordinates of each data xi point are drawn i.i.d. from Unif[ 1, 1], as are the labels yi. |
| Dataset Splits | No | The paper uses synthetic data for its experiments but does not explicitly mention any training, validation, or test splits. The goal is to train networks to interpolation, implying the entire dataset is used for this purpose without separate splits. |
| Hardware Specification | No | Our experiments are small-scale and computationally light and can easily be run on almost any computational setup, so we do not feel the need to report specifics on the compute resources. |
| Software Dependencies | No | We test our algorithm on two simple synthetic datasets...implemented in Py Torch using the Adam optimizer... along with that of Adam-only (no regularization) and Adam W weight decay. |
| Experiment Setup | Yes | All networks share the same random initialization and are trained with MSE loss for 100,000 epochs with learning rate γ = 0.01, regularization parameter λ = 0.003 (except for unregularized Adam-only, which uses λ = 0), and hidden layer width K = 80. For our second experiment...All networks are trained using MSE loss for 100,000 epochs with learning rate γ = 0.01, regularization parameter λ = 0.005 (except for unregularized Adam-only, which uses λ = 0), and hidden layer width K = 100. |