Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Authors: Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms.
Researcher Affiliation Academia 1Northeastern University 2MIT CSAIL 3Technion IIT
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code, data and fully fine-tuned model can be accessed at https://finetuning.baulab.info.
Open Datasets Yes To explore the internal mechanism that enables entity tracking we adapt the dataset presented in Kim & Schuster (2023), aimed at evaluating the ability of a language model to track state changes of discourse entities.
Dataset Splits Yes We synthetically generated training (N = 1000) and eval datasets (N = 500), according to the desiderata.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only generally mentions "computing resources".
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We trained it for two epochs, with ADAM optimizer and a batch size of 32.