Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Fine-Tuning Allows for Effective Meta-Learning

Authors: Kurtland Chua, Qi Lei, Jason D. Lee

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a formal construction and an experimental verification of the gap in Section C. Furthermore, we extend the linear hard case to a nonlinear setting in Section G. In Section C, we provide the formal construction and an experimental verification of the gap in Section C.3. Our experiments only involve simulations of simple settings that do not require extensive compute.
Researcher Affiliation Academia Kurtland Chua Princeton University EMAIL Qi Lei Princeton University EMAIL Jason D. Lee Princeton University EMAIL
Pseudocode No The paper describes algorithms like ADAPTREP and FROZENREP but does not present them in a structured pseudocode or algorithm block format.
Open Source Code Yes A Jupyter notebook is provided to run the simulation outlined in Section C.3.
Open Datasets No The paper does not use named public datasets; instead, it describes synthetic data generation for its theoretical analysis and simulations, e.g.,
Dataset Splits No We do not use train-validation splits, as is widespread in practice. This is motivated by results in Bai et al. (2020), which show that data splitting may be undesirable, assuming realizability.
Hardware Specification No Our experiments only involve simulations of simple settings that do not require extensive compute.
Software Dependencies No The paper mentions a
Experiment Setup Yes Source training. We consider the following regularized form of (1): min B min t,wt 1 2n ST t=1 yt Xt(B + t)wt 2 2 + λ 2 t 2 F + γ 2 wt 2 2 . In Section B, we show that the regularization is equivalent to regularizing λγ twt 2, consistent with the intuition that t w t has small norm. This additional regularization is necessary, since (1) only controls the norm of t, which is insufficient for controlling twt. Target training. Let B0 be the output of (4) after orthonormalizing. We adapt to the target task via Lβ( , w) = 1 2n y βX (AB0 + ) (w0 + w) 2 2 , (5) where AB0 := [B0 B0] Rd 2k and w0 = [u, u] for a fixed unit-norm vector u Rk. This corresponds to training a predictor of the form x 7 x, (AB0 + )(w0 + w) . We optimize (5) by performing TPGD steps of PGD with stepsize η on (5) with Cβ := {( , w) | F c1/β, w 2 c2/β} as the feasible set, where we explicitly define c1 and c2 in Section B. In Section C.3, it also states: