First-order ANIL provably learns representations despite overparametrisation
Authors: Oğuz Kaan Yüksel, Etienne Boursier, Nicolas Flammarion
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS This section empirically studies the behavior of model-agnostic methods on a toy example. We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. We study a largely misspecified problem where k = d. To demonstrate that Theorem 1 holds more generally, we consider a non-identity covariance Σ proportional to diag(1, , k). Further experimental details, along with additional experiments involving two-layer and three-layer Re LU networks, can be found in Appendix I. |
| Researcher Affiliation | Academia | Oguz Kaan Y uksel TML Lab, EPFL oguz.yuksel@epfl.ch Etienne Boursier INRIA, Universit e Paris Saclay, LMO etienne.boursier@inria.fr Nicolas Flammarion TML Lab, EPFL nicolas.flammarion@epfl.ch |
| Pseudocode | No | The paper does not include a pseudocode or algorithm block. |
| Open Source Code | No | No explicit statement or link providing access to source code for the methodology described in the paper was found. |
| Open Datasets | No | The paper describes a synthetic data generation process with specific distributions and parameters (e.g., 'Each row of Xi is drawn i.i.d. according to N(0, Id)', 'task parameters w ,i are drawn i.i.d with E[w ,i] = 0 and covariance matrix Σ := E[w ,iw ,i] = c Ik'). It does not provide access to a pre-existing public dataset via a link, DOI, or citation. |
| Dataset Splits | Yes | For min + mout = m with min < m, we split the observations of each task as (Xin i , yin i ) Rmin d Rmin the min first rows of (Xi, yi); and (Xout i , yout i ) Rmout d Rmout the mout last rows of (Xi, yi). ... samples are split into two subsets with min = 20 and mout = 10 for model-agnostic methods. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or cloud computing instances). |
| Software Dependencies | No | The paper mentions 'scipy' for LBFGS in Appendix I.1 but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. ... Model-agnostic methods are all trained using step sizes α = β = 0.025. ... The matrix B0 is initialized randomly as an orthogonal matrix such that B 0 B0 = 1 4αIk . The vector w0 is initialized uniformly at random on the k -dimensional sphere with squared radius 0.01k α. ... The regularization parameter λ is tuned for each method using a grid search over multiple values. |