First-order ANIL provably learns representations despite overparametrisation

Authors: Oğuz Kaan Yüksel, Etienne Boursier, Nicolas Flammarion

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS This section empirically studies the behavior of model-agnostic methods on a toy example. We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. We study a largely misspecified problem where k = d. To demonstrate that Theorem 1 holds more generally, we consider a non-identity covariance Σ proportional to diag(1, , k). Further experimental details, along with additional experiments involving two-layer and three-layer Re LU networks, can be found in Appendix I.
Researcher Affiliation Academia Oguz Kaan Y uksel TML Lab, EPFL oguz.yuksel@epfl.ch Etienne Boursier INRIA, Universit e Paris Saclay, LMO etienne.boursier@inria.fr Nicolas Flammarion TML Lab, EPFL nicolas.flammarion@epfl.ch
Pseudocode No The paper does not include a pseudocode or algorithm block.
Open Source Code No No explicit statement or link providing access to source code for the methodology described in the paper was found.
Open Datasets No The paper describes a synthetic data generation process with specific distributions and parameters (e.g., 'Each row of Xi is drawn i.i.d. according to N(0, Id)', 'task parameters w ,i are drawn i.i.d with E[w ,i] = 0 and covariance matrix Σ := E[w ,iw ,i] = c Ik'). It does not provide access to a pre-existing public dataset via a link, DOI, or citation.
Dataset Splits Yes For min + mout = m with min < m, we split the observations of each task as (Xin i , yin i ) Rmin d Rmin the min first rows of (Xi, yi); and (Xout i , yout i ) Rmout d Rmout the mout last rows of (Xi, yi). ... samples are split into two subsets with min = 20 and mout = 10 for model-agnostic methods.
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or cloud computing instances).
Software Dependencies No The paper mentions 'scipy' for LBFGS in Appendix I.1 but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We consider a setup with a large but finite number of tasks N = 5000, feature dimension d = 50, a limited number of samples per task m = 30, small hidden dimension k = 5 and Gaussian label noise with variance σ2 = 4. ... Model-agnostic methods are all trained using step sizes α = β = 0.025. ... The matrix B0 is initialized randomly as an orthogonal matrix such that B 0 B0 = 1 4αIk . The vector w0 is initialized uniformly at random on the k -dimensional sphere with squared radius 0.01k α. ... The regularization parameter λ is tuned for each method using a grid search over multiple values.