LoFiT: Localized Fine-tuning on LLM Representations

Authors: Fangcong Yin, Xi Ye, Greg Durrett

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LOFIT on question answering (QA), multi-hop reasoning, and counterfactual reasoning tasks, which are common settings for evaluating interpretability-motivated methods [21, 60]. We focus on a relatively low data condition: for each dataset, we sample 500 training points or fewer, to be consistent with the common low-data setup of representation intervention methods.
Researcher Affiliation Academia Fangcong Yin The University of Texas at Austin fangcongyin@utexas.edu Xi Ye Princeton University xi.ye@princeton.edu Greg Durrett The University of Texas at Austin gdurrett@cs.utexas.edu
Pseudocode No The paper describes the LOFIT methodology in text and Figure 1, but does not include a formally labeled “Pseudocode” or “Algorithm” block.
Open Source Code Yes 1Our code is available at https://github.com/fc2869/lo-fit.
Open Datasets Yes Truthful QA [24] is a QA dataset with questions where humans are likely to give false answers because of common misconceptions. (Section 4) Truthful QA [24] uses the Apache-2.0 license and data is available at: https://github.com/ sylinrl/Truthful QA. (Appendix I)
Dataset Splits Yes Truthful QA [24]... We follow the setup in [21] to split Truthful QA into train/dev/test sets into 326/82/407 questions... CLUTRR [41]... We use the subset of 2-hop questions and randomly split the data into train/dev/test sets of 300/450/450 QA pairs. MQu AKE [58]... Data is randomly split into train/dev/test sets of 134/95/864 QA pairs.
Hardware Specification Yes We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory.
Software Dependencies No We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA.
Experiment Setup Yes We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory. We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA. We use Adam W optimizer for fine-tuning [26] with ϵ = 1e 8 an a weight decay of factor 0.01. (Appendix C.1) For all experiments... we fine-tuned for 5 epochs with a batch size of 8... Method-specific hyperparameters can be found in the following subsections. Hyperparameters of LOFIT used in each experiment are summarized in Table 6. (Appendix D)