reproducibilityindex.ai

A Kernel-Based View of Language Model Fine-Tuning

Authors: Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 14 NLP tasks validate our theory and show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning.
Researcher Affiliation	Academia	Department of Computer Science, Princeton University, Princeton, NJ, USA.
Pseudocode	No	The paper does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	Our code and pre-computed kernels are publicly available at https://github.com/princeton-nlp/LM-Kernel-FT.
Open Datasets	Yes	We consider 14 NLP tasks, divided into 8 single sentence and 6 sentence pair datasets, which cover: sentiment analysis (SST-2, SST-5, MR, CR); classifying an opinion s polarity (MQPA) or subjectivity (Subj) or question type (TREC) or news topic (AG News); natural language inference (MNLI, SNLI, QNLI, RTE); and paraphrase detection tasks (MRPC, QQP). For each task, we randomly sample 5 k-shot datasets with k training examples for each label.
Dataset Splits	Yes	To generate k-shot few-shot datasets, the original training data is used to randomly sample k examples per label for training and another, separate k examples per label for the validation set.
Hardware Specification	No	The paper mentions using a "pre-trained Ro BERTa-base (Liu et al., 2020b)" which is a model, but does not specify any hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper states: "We use functorch (He & Zou, 2021) to compute the e NTK for Ro BERTa-base". While it names a software (functorch) and cites its paper, it does not provide a specific version number for functorch itself.
Experiment Setup	Yes	We use value ranges given by (Gao et al., 2021) and (Hu et al., 2021), and search over a wider range of values for SGD. Table 4 shows the hyperparameter grids for fine-tuning and the kernel method.