reproducibilityindex.ai

How Fine-Tuning Allows for Effective Meta-Learning

Authors: Kurtland Chua, Qi Lei, Jason D. Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a formal construction and an experimental veriﬁcation of the gap in Section C. Furthermore, we extend the linear hard case to a nonlinear setting in Section G. In Section C, we provide the formal construction and an experimental veriﬁcation of the gap in Section C.3. Our experiments only involve simulations of simple settings that do not require extensive compute.
Researcher Affiliation	Academia	Kurtland Chua Princeton University kchua@princeton.edu Qi Lei Princeton University qilei@princeton.edu Jason D. Lee Princeton University jasonlee@princeton.edu
Pseudocode	No	The paper describes algorithms like ADAPTREP and FROZENREP but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	A Jupyter notebook is provided to run the simulation outlined in Section C.3.
Open Datasets	No	The paper does not use named public datasets; instead, it describes synthetic data generation for its theoretical analysis and simulations, e.g.,
Dataset Splits	No	We do not use train-validation splits, as is widespread in practice. This is motivated by results in Bai et al. (2020), which show that data splitting may be undesirable, assuming realizability.
Hardware Specification	No	Our experiments only involve simulations of simple settings that do not require extensive compute.
Software Dependencies	No	The paper mentions a
Experiment Setup	Yes	Source training. We consider the following regularized form of (1): min B min t,wt 1 2n ST t=1 yt Xt(B + t)wt 2 2 + λ 2 t 2 F + γ 2 wt 2 2 . In Section B, we show that the regularization is equivalent to regularizing λγ twt 2, consistent with the intuition that t w t has small norm. This additional regularization is necessary, since (1) only controls the norm of t, which is insufﬁcient for controlling twt. Target training. Let B0 be the output of (4) after orthonormalizing. We adapt to the target task via Lβ( , w) = 1 2n y βX (AB0 + ) (w0 + w) 2 2 , (5) where AB0 := [B0 B0] Rd 2k and w0 = [u, u] for a ﬁxed unit-norm vector u Rk. This corresponds to training a predictor of the form x 7 x, (AB0 + )(w0 + w) . We optimize (5) by performing TPGD steps of PGD with stepsize η on (5) with Cβ := {( , w) \| F c1/β, w 2 c2/β} as the feasible set, where we explicitly deﬁne c1 and c2 in Section B. In Section C.3, it also states: