Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Authors: Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. |
| Researcher Affiliation | Academia | 1Northeastern University 2MIT CSAIL 3Technion IIT |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code, data and fully fine-tuned model can be accessed at https://finetuning.baulab.info. |
| Open Datasets | Yes | To explore the internal mechanism that enables entity tracking we adapt the dataset presented in Kim & Schuster (2023), aimed at evaluating the ability of a language model to track state changes of discourse entities. |
| Dataset Splits | Yes | We synthetically generated training (N = 1000) and eval datasets (N = 500), according to the desiderata. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only generally mentions "computing resources". |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We trained it for two epochs, with ADAM optimizer and a batch size of 32. |