Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Authors: Akiyoshi Tomihari, Issei Sato

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments using a Transformer-based model on multiple natural language processing datasets confirm our theoretical analysis." and "5 Numerical evaluation with transformer models In this section, we numerically justify the following aspects obtained from our analysis:
Researcher Affiliation Academia Akiyoshi Tomihari The University of Tokyo EMAIL Issei Sato The University of Tokyo EMAIL
Pseudocode No No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code Yes Code is available at https://github.com/tom4649/lp-ft_ntk.
Open Datasets Yes Datasets and models We used a total of 13 classification datasets from various benchmarks: Super GLUE [Wang et al., 2019], GLUE [Wang et al., 2018], BOSS [Yuan et al., 2023], and Pub Med 20k RCT [Dernoncourt and Lee, 2017].
Dataset Splits Yes For the datasets from the GLUE, Super GLUE, and BOSS benchmarks, we divided the original training set using a 9:1 training-to-validation ratio, using the original validation set as the test set, in accordance with Chen et al. [2022].
Hardware Specification Yes All experiments were run on a single NVIDIA A100 GPU.
Software Dependencies No Our code is built on Py Torch [Paszke et al., 2019], using the Hugging Face Transformers library [Wolf et al., 2020] and Adapter Hub [Pfeiffer et al., 2020].
Experiment Setup Yes Hyperparameter tuning, especially for learning rates during the FT stage of LP-FT, was conducted through a grid search based on the validation set performance." and "Details on the hyperparameters for our experiments can be found in Table 6.