Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
First-order ANIL provably learns representations despite overparametrisation
Authors: Oğuz Kaan Yüksel, Etienne Boursier, Nicolas Flammarion
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS This section empirically studies the behavior of model-agnostic methods on a toy example. We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. We study a largely misspecified problem where k = d. To demonstrate that Theorem 1 holds more generally, we consider a non-identity covariance Σ proportional to diag(1, , k). Further experimental details, along with additional experiments involving two-layer and three-layer Re LU networks, can be found in Appendix I. |
| Researcher Affiliation | Academia | Oguz Kaan Y uksel TML Lab, EPFL EMAIL Etienne Boursier INRIA, Universit e Paris Saclay, LMO EMAIL Nicolas Flammarion TML Lab, EPFL EMAIL |
| Pseudocode | No | The paper does not include a pseudocode or algorithm block. |
| Open Source Code | No | No explicit statement or link providing access to source code for the methodology described in the paper was found. |
| Open Datasets | No | The paper describes a synthetic data generation process with specific distributions and parameters (e.g., 'Each row of Xi is drawn i.i.d. according to N(0, Id)', 'task parameters w ,i are drawn i.i.d with E[w ,i] = 0 and covariance matrix Σ := E[w ,iw ,i] = c Ik'). It does not provide access to a pre-existing public dataset via a link, DOI, or citation. |
| Dataset Splits | Yes | For min + mout = m with min < m, we split the observations of each task as (Xin i , yin i ) Rmin d Rmin the min first rows of (Xi, yi); and (Xout i , yout i ) Rmout d Rmout the mout last rows of (Xi, yi). ... samples are split into two subsets with min = 20 and mout = 10 for model-agnostic methods. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or cloud computing instances). |
| Software Dependencies | No | The paper mentions 'scipy' for LBFGS in Appendix I.1 but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. ... Model-agnostic methods are all trained using step sizes α = β = 0.025. ... The matrix B0 is initialized randomly as an orthogonal matrix such that B 0 B0 = 1 4αIk . The vector w0 is initialized uniformly at random on the k -dimensional sphere with squared radius 0.01k α. ... The regularization parameter λ is tuned for each method using a grid search over multiple values. |