Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Hierarchical Polynomials of Multiple Nonlinear Features
Authors: Hengyu Fu, Zihao Wang, Eshaan Nichani, Jason Lee
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A NUMERICAL EXPERIMENTS We empirically verify Theorem 1 and Proposition 1. [...] The left panel of Figure 2 demonstrates that our model outperforms the naive random-feature model across all dimensions. |
| Researcher Affiliation | Academia | Peking University. Email: EMAIL Stanford University. Email: EMAIL Princeton University. Email: EMAIL |
| Pseudocode | Yes | Algorithm 1 Layer-wise training algorithm |
| Open Source Code | No | No explicit statement about code availability or a repository link was found in the paper. |
| Open Datasets | No | Data distribution Our aim is to learn the target function f : X R, with X Rd being the input space. Throughout the paper, we assume X = Sd 1(d), that is, the sphere with radius d in d dimensions. Also, we consider the data distribution to be the uniform distribution on the sphere, i.e., x Unif(X), and we draw two independent datasets D1, D2, each with n1 and n2 i.i.d. samples, respectively. |
| Dataset Splits | Yes | Data distribution [...] we draw two independent datasets D1, D2, each with n1 and n2 i.i.d. samples, respectively. Thus, we draw n1 + n2 samples in total. [...] Training Algorithm Following Nichani et al. (2023), our network is trained via layer-wise gradient descent with sample splitting. [...] Algorithm 1 Layer-wise training algorithm Input: Learning rates η1, η2, weight decay λ1, λ2, parameter ϵ, number of steps T [...] 2 train W on dataset D1 [...] 8 train a on dataset D2 |
| Hardware Specification | No | No specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) were found in the paper's experimental setup or any other section. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library or solver names with versions) were found in the paper. |
| Experiment Setup | Yes | Input: Learning rates η1, η2, weight decay λ1, λ2, parameter ϵ, number of steps T [...] We initialize each row of V to be drawn uniformly on the sphere of radius d [...] For the network architecture, we choose σ1 as per (2) and σ2 = Q2, with network sizes set to m1 = 10000 and m2 = 20000. [...] For the right panel, we conduct transfer learning with n1 = 216 pretraining samples and plot the dependence on n2. The figure reports the mean and normalized standard error of the test error using 10,000 fresh samples, based on 5 independent experimental instances. |