Provable Generalization of Overparameterized Meta-learning Trained with SGD
Authors: Yu Huang, Yingbin Liang, Longbo Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical findings are further validated by experiments. Figures 1, 2, and 3 provide experimental results. |
| Researcher Affiliation | Academia | Yu Huang IIIS Tsinghua University y-huang20@mails.tsinghua.edu.cn Yingbin Liang Department of ECE The Ohio State University liang.889@osu.edu Longbo Huang IIIS Tsinghua University longbohuang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 MAML with SGD |
| Open Source Code | No | The paper includes a self-assessment indicating code is provided in supplemental material, but the main text does not contain a specific statement or URL for open-source code. |
| Open Datasets | No | The paper uses a 'mixed linear regression model' and defines data distributions (e.g., 'x Rd is mean zero with covariance operator Σ = E[xx ]'). While it describes the model, it does not refer to or provide access information for a named public dataset. |
| Dataset Splits | Yes | Suppose that Dt is randomly split into training and validation sets, denoted respectively as Din t (Xin t , yin t ) and Dout t (Xout t , yout t ), correspondingly containing n1 and n2 samples (i.e., N = n1 + n2). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | d = 500, T = 300, λi = 1 i log(i+1)2 , βtr = 0.02, βte = 0.2 (Figure 1 caption). d = 200, T = 100, Σθ = 0.82 d I, βte = 0.2 (Figure 2 caption). Let s = T log p(T) and d = T logq(T), where p, q > 0. Suppose Px is Gaussian and the spectrum of Σ satisfies λk = 1/s, k s 1/(d s), s + 1 k d. Suppose the spectral parameter νi of Σθ is O(1), and let the step size α = 1 2c(βtr,Σ) tr(Σ). (Proposition 4). |