Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Authors: Akiyoshi Tomihari, Issei Sato

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments using a Transformer-based model on multiple natural language processing datasets confirm our theoretical analysis." and "5 Numerical evaluation with transformer models In this section, we numerically justify the following aspects obtained from our analysis:
Researcher Affiliation Academia Akiyoshi Tomihari The University of Tokyo tomihari@g.ecc.u-tokyo.ac.jp Issei Sato The University of Tokyo sato@g.ecc.u-tokyo.ac.jp
Pseudocode No No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code Yes Code is available at https://github.com/tom4649/lp-ft_ntk.
Open Datasets Yes Datasets and models We used a total of 13 classification datasets from various benchmarks: Super GLUE [Wang et al., 2019], GLUE [Wang et al., 2018], BOSS [Yuan et al., 2023], and Pub Med 20k RCT [Dernoncourt and Lee, 2017].
Dataset Splits Yes For the datasets from the GLUE, Super GLUE, and BOSS benchmarks, we divided the original training set using a 9:1 training-to-validation ratio, using the original validation set as the test set, in accordance with Chen et al. [2022].
Hardware Specification Yes All experiments were run on a single NVIDIA A100 GPU.
Software Dependencies No Our code is built on Py Torch [Paszke et al., 2019], using the Hugging Face Transformers library [Wolf et al., 2020] and Adapter Hub [Pfeiffer et al., 2020].
Experiment Setup Yes Hyperparameter tuning, especially for learning rates during the FT stage of LP-FT, was conducted through a grid search based on the validation set performance." and "Details on the hyperparameters for our experiments can be found in Table 6.