Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Authors: Julia Nakhleh, Robert D. Nowak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform several simple experiments on synthetic data which suggest that our proposed ℓp path norm lends itself to practical application, recovering far sparser solutions more quickly than unregularized or weight decay-regularized gradient-based training.
Researcher Affiliation Academia Julia Nakhleh Department of Computer Science University of Wisconsin-Madison Madison, WI EMAIL Robert D. Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI EMAIL
Pseudocode Yes The full algorithm is summarized in Algorithm 1.
Open Source Code Yes Code for these experiments is available at https://github.com/julianakhleh/sparse_nns_lp.
Open Datasets No We perform several simple experiments on synthetic data which suggest that our proposed ℓp path norm lends itself to practical application... The first is a univariate peak/plateau dataset, which consists of the data/label pairs: ( 2, 0), ( 1, 0), (0, 1), (1, 1), (2, 0), (3, 0)... For our second experiment, we consider N = 10 data points in d = 50 dimensions. The coordinates of each data xi point are drawn i.i.d. from Unif[ 1, 1], as are the labels yi.
Dataset Splits No The paper uses synthetic data for its experiments but does not explicitly mention any training, validation, or test splits. The goal is to train networks to interpolation, implying the entire dataset is used for this purpose without separate splits.
Hardware Specification No Our experiments are small-scale and computationally light and can easily be run on almost any computational setup, so we do not feel the need to report specifics on the compute resources.
Software Dependencies No We test our algorithm on two simple synthetic datasets...implemented in Py Torch using the Adam optimizer... along with that of Adam-only (no regularization) and Adam W weight decay.
Experiment Setup Yes All networks share the same random initialization and are trained with MSE loss for 100,000 epochs with learning rate γ = 0.01, regularization parameter λ = 0.003 (except for unregularized Adam-only, which uses λ = 0), and hidden layer width K = 80. For our second experiment...All networks are trained using MSE loss for 100,000 epochs with learning rate γ = 0.01, regularization parameter λ = 0.005 (except for unregularized Adam-only, which uses λ = 0), and hidden layer width K = 100.