Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
How Fine-Tuning Allows for Effective Meta-Learning
Authors: Kurtland Chua, Qi Lei, Jason D. Lee
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a formal construction and an experimental verification of the gap in Section C. Furthermore, we extend the linear hard case to a nonlinear setting in Section G. In Section C, we provide the formal construction and an experimental verification of the gap in Section C.3. Our experiments only involve simulations of simple settings that do not require extensive compute. |
| Researcher Affiliation | Academia | Kurtland Chua Princeton University EMAIL Qi Lei Princeton University EMAIL Jason D. Lee Princeton University EMAIL |
| Pseudocode | No | The paper describes algorithms like ADAPTREP and FROZENREP but does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | Yes | A Jupyter notebook is provided to run the simulation outlined in Section C.3. |
| Open Datasets | No | The paper does not use named public datasets; instead, it describes synthetic data generation for its theoretical analysis and simulations, e.g., |
| Dataset Splits | No | We do not use train-validation splits, as is widespread in practice. This is motivated by results in Bai et al. (2020), which show that data splitting may be undesirable, assuming realizability. |
| Hardware Specification | No | Our experiments only involve simulations of simple settings that do not require extensive compute. |
| Software Dependencies | No | The paper mentions a |
| Experiment Setup | Yes | Source training. We consider the following regularized form of (1): min B min t,wt 1 2n ST t=1 yt Xt(B + t)wt 2 2 + λ 2 t 2 F + γ 2 wt 2 2 . In Section B, we show that the regularization is equivalent to regularizing λγ twt 2, consistent with the intuition that t w t has small norm. This additional regularization is necessary, since (1) only controls the norm of t, which is insufficient for controlling twt. Target training. Let B0 be the output of (4) after orthonormalizing. We adapt to the target task via Lβ( , w) = 1 2n y βX (AB0 + ) (w0 + w) 2 2 , (5) where AB0 := [B0 B0] Rd 2k and w0 = [u, u] for a fixed unit-norm vector u Rk. This corresponds to training a predictor of the form x 7 x, (AB0 + )(w0 + w) . We optimize (5) by performing TPGD steps of PGD with stepsize η on (5) with Cβ := {( , w) | F c1/β, w 2 c2/β} as the feasible set, where we explicitly define c1 and c2 in Section B. In Section C.3, it also states: |