Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
Authors: Akiyoshi Tomihari, Issei Sato
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments using a Transformer-based model on multiple natural language processing datasets confirm our theoretical analysis." and "5 Numerical evaluation with transformer models In this section, we numerically justify the following aspects obtained from our analysis: |
| Researcher Affiliation | Academia | Akiyoshi Tomihari The University of Tokyo tomihari@g.ecc.u-tokyo.ac.jp Issei Sato The University of Tokyo sato@g.ecc.u-tokyo.ac.jp |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly presented or labeled in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/tom4649/lp-ft_ntk. |
| Open Datasets | Yes | Datasets and models We used a total of 13 classification datasets from various benchmarks: Super GLUE [Wang et al., 2019], GLUE [Wang et al., 2018], BOSS [Yuan et al., 2023], and Pub Med 20k RCT [Dernoncourt and Lee, 2017]. |
| Dataset Splits | Yes | For the datasets from the GLUE, Super GLUE, and BOSS benchmarks, we divided the original training set using a 9:1 training-to-validation ratio, using the original validation set as the test set, in accordance with Chen et al. [2022]. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA A100 GPU. |
| Software Dependencies | No | Our code is built on Py Torch [Paszke et al., 2019], using the Hugging Face Transformers library [Wolf et al., 2020] and Adapter Hub [Pfeiffer et al., 2020]. |
| Experiment Setup | Yes | Hyperparameter tuning, especially for learning rates during the FT stage of LP-FT, was conducted through a grid search based on the validation set performance." and "Details on the hyperparameters for our experiments can be found in Table 6. |