reproducibilityindex.ai

Task-Specific Skill Localization in Fine-tuned Language Models

Authors: Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments suggest that localization via grafting can assist certain forms of continual learning. Our code is available at Skill-Localization-by-grafting1.
Researcher Affiliation	Collaboration	*Equal contribution 1Department of Computer Science, Princeton University. Correspondence to: Abhishek Panigrahi <ap34@princeton.edu>, Nikunj Saunshi <nsaunshi@google.com>.
Pseudocode	No	The paper describes its optimization procedure and other methods in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at Skill-Localization-by-grafting1. 1https://github.com/abhishekpanigrahi1996/Skill Localization-by-grafting
Open Datasets	Yes	We fine-tuned the pre-trained Ro BERTa-base (Liu et al., 2019b) model on 13 different tasks, with the majority from GLUE (Wang et al., 2018), including sentiment analysis, topic classification, natural language inference, and paraphrase detection datasets.
Dataset Splits	Yes	We make a random 95% 5% split of the training set to have a validation set for hyperparameter tuning.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments, such as specific GPU models, CPU types, or cloud computing instances with detailed specifications.
Software Dependencies	No	The paper mentions software components like "Ro BERTa-base", "GPT-2", "SGD optimizer", and "Adam W" but does not provide specific version numbers for these or any other libraries or dependencies, which are necessary for reproducible software setup.
Experiment Setup	Yes	For SGD, we follow the grid {2, 4, 8} for batch size and {10 2, 5 10 3, 10 3} for learning rate and apply a small weight decay of 10 4 on all the model parameters during training. Model grafting experiments optimize Equation (3) using SGD with batch size 1024 (full-batch GD for 64-shot) for 100 steps with learning rate 107.