Understanding Benign Overfitting in Gradient-Based Meta Learning

Authors: Lisha Chen, Songtao Lu, Tianyi Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations.
Researcher Affiliation Collaboration Lisha Chen Rensselaer Polytechnic Institute Troy, NY, USA chenl21@rpi.edu Songtao Lu IBM Research Yorktown Heights, NY, USA songtao@ibm.com Tianyi Chen Rensselaer Polytechnic Institute Troy, NY, USA chentianyi19@gmail.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] Work of theoretical nature.
Open Datasets No The paper discusses 'numerical simulations' based on a 'meta linear regression model' with 'Assumptions 2-4' about data properties, but does not specify a publicly available dataset used for these simulations.
Dataset Splits Yes For each task m, we observe N samples with input feature xm Xm Rd and target label ym Ym R drawn i.i.d. from a task-specific data distribution Pm. These samples are collected in the dataset Dm = {(xm,n, ym,n)}N n=1, which is divided into the train and validation datasets, denoted as Dtr m and Dva m. And |Dtr m| = Ntr and |Dva m| = Nva with N = Ntr + Nva.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the numerical simulations.
Software Dependencies No The paper does not provide any specific software dependencies with version numbers used for the numerical simulations.
Experiment Setup Yes Figure 3: Excess risk vs number of samples (N) with different hyperparameters (M = 10, d = 200). Example 1 (Data covariance): Suppose Qm = diag(Id1, βId d1), m. Set M = 10, d = 200, d1 = 20, α = 0.1 for MAML and γ = 103 for i MAML.