Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Authors: Eshaan Nichani, Alex Damian, Jason D. Lee
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We ran Algorithm 1 on both the single index and quadratic feature settings described in Section 4. Each trial was run with 5 random seeds. The solid lines represent the medians and the shaded areas represent the min and max values. For every trial we recorded both the test loss on a test set of size 215 and the linear correlation between the learned feature map ϕ(x) and the true intermediate feature h (x) where h (x) = x β for the single index setting and h (x) = x T Ax for the quadratic feature setting. Our results show that the test loss goes to 0 as the linear correlation between the learned feature map ϕ and the true intermediate feature h approaches 1. |
| Researcher Affiliation | Academia | Eshaan Nichani Princeton University EMAIL Alex Damian Princeton University EMAIL Jason D. Lee Princeton University EMAIL |
| Pseudocode | Yes | Algorithm 1 Layer-wise training algorithm |
| Open Source Code | No | The paper states 'Our experiments were written in JAX [14]', but [14] refers to the JAX library, not the authors' specific implementation code for the paper. |
| Open Datasets | No | The paper defines abstract data distributions like 'ν = N(0, I)' for single-index models and 'ν the uniform measure on Xd' for quadratic features, which are synthetic and not publicly available, named datasets. It doesn't use or cite any well-known public datasets like CIFAR-10 or MNIST. |
| Dataset Splits | Yes | We optimize the hyperparameters η1, λ using grid search over a holdout validation set of size 2^15 and report the final error over a test set of size 2^15. |
| Hardware Specification | Yes | Our experiments were written in JAX [14], and were run on a single NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper states 'Our experiments were written in JAX [14]'. While JAX is mentioned, a specific version number is not provided, and no other software dependencies with version numbers are listed. |
| Experiment Setup | Yes | Input: Initialization θ(0); learning rates η1, η2; weight decay λ; time T (Algorithm 1) ... We optimize the hyperparameters η1, λ using grid search over a holdout validation set of size 2^15 ... Both networks are initialized using the µP parameterization [57] and are trained using SGD with momentum on all layers simultaneously. |