Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Authors: Xu Wang, Yan Hu, Wenyu Du, Reynold Cheng, Benyou Wang, Difan Zou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, which is in contrast to the previous work (Prakash et al., 2024; Chhabra et al., 2024) that show circuits only add some additional components after fine-tuning. Based on these observations, we develop a circuit-aware Low-Rank Adaptation (Lo RA) method, which assigns ranks to layers based on edge changes in the circuits. Experimental results demonstrate that our circuit-based Lo RA algorithm achieves an average performance improvement of 2.46% over standard Lo RA with similar parameter sizes.
Researcher Affiliation	Academia	1School of Computing and Data Science, The University of Hong Kong 2School of Data Science, The Chinese University of Hong Kong, Shenzhen. This work is done when Xu Wang is working at The Chinese University of Hong Kong, Shenzhen supervised by Dr. Yan Hu. Correspondence to: Difan Zou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Circuit Lo RA: Improve Lo RA Using Circuit-Based Critical Layers Identification Input: Pre-trained model M, Pre-finetuning circuit Cbefore, Post-finetuning circuit Cafter, Lo RA ranks ro, rc, Scaling factors α, αcritical Phase 1: Critical Layers Identification Compute edge differences e between Cbefore and Cafter Aggregate e to layer scores l and select critical layers Lcritical Phase 2: Module Replacement for each layer l M do if l Lcritical then Replace l with Enhanced Lo RALinear using rc and αcritical else Replace l with Lo RALinear using ro and α end if end for Return: Updated model M
Open Source Code	Yes	The code and data are available at https://github.com/Xu0615/Finetune Circuits.
Open Datasets	Yes	To better understand fine-tuning mechanisms in practical settings, it is crucial to focus on tasks where fine-tuning leads to performance improvements. In this work, we design a class of mathematical tasks on which pre-trained large language models initially perform poorly with low accuracy, yet demonstrates a performance boost after fine-tuned. We employ the Edge Attribution Patching with Integrated Gradients (EAP-IG) (Hanna et al., 2024) method to identify circuits within both pre-trained and fine-tuned models. Surprisingly, we observe that this approach consistently finds Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis circuits with high faithfulness, even though the two models differ markedly in performance (see 3). To further validate the stability of the discovered circuits, we introduce another circuit metric, robustness, which measures the stability of identified circuits by assessing their edge similarity under different perturbation ratios of the dataset. We show that when compared with a randomly initialized transformer model, the pre-trained model, despite exhibiting very low prediction accuracy, can still achieve substantially higher robustness. This finding further supports the validity of the circuits discovered during the fine-tuning process, irrespective of their prediction performance. Our Main Findings. Based on the circuits analysis techniques and tasks introduced in 3, we provide a comprehensive interpretation of the key factors in the finetuning process. Specifically, we focus on three central research questions and summarize our main observations as follows. The code and data are available at https://github.com/Xu0615/Finetune Circuits.
Dataset Splits	Yes	For each task, we ensure a strict separation between the dataset used for fine-tuning and the dataset used for circuit analysis. Specifically, 80% of the dataset is allocated for fine-tuning, and the remaining 20% is reserved for identifying circuits and evaluating the model s and circuit s accuracies. This separation guarantees that performance evaluation is conducted on data unseen during fine-tuning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions the models fine-tuned (Pythia-1.4B, gpt-neo-2.7B, opt-6.7B) but no underlying hardware.
Software Dependencies	No	The paper mentions specific LLM models (pythia-1.4B-deduped, gpt-neo-2.7B, opt-6.7B) and PEFT methods (Lo RA, Ada Lo RA, IA3) as well as the EAP-IG method. However, it does not specify versions for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Fine-tuning experiments were conducted across various arithmetic tasks, with configurations tailored to each. All tasks were trained with a batch size of 8, gradient accumulation steps of 4, and a warmup of 50 steps, using a weight decay of 0.01. Addition and Subtraction (Add/Sub) task, which includes subtasks with ranges of 100, 200, 300, 400, and 500, each subtask consists of 5,000 samples. The 100-range subtask was trained for 2 epochs, while others were trained for 4 epochs. Lo RA experiments were performed with ranks r = 2, 8, 16, 32, using a learning rate of 3e-4, except for the 400-range (r = 32, lr=2e-4). Full Parameter Fine-Tuning (FPFT) used learning rates of 8e-6 (100-range), 6e-6 (200-range), 5e-6 (400-range), and 4e-6 (500-range). Circuit Lo RA applied higher learning rates (4e-4 or 5e-4) for Critical Layers and 3e-4 for non-Critical Layers. Multiplication and Division (Mul/Div) task contains 2,000 samples and was trained for 2 epochs. Lo RA used a learning rate of 3e-4, FPFT used 4e-6, and Circuit Lo RA used 2e-4 for Critical Layers and 3e-4 for non-Critical Layers. Arithmetic and Geometric Sequence (Sequence) task includes 5,000 samples, trained for 4 epochs. Lo RA experiments used a learning rate of 3e-4, FPFT used 8e-6, and Circuit Lo RA applied 6e-4 (r = 32) and 5e-4 (r = 64) for Critical Layers, with 3e-4 for non-Critical Layers. Least Common Multiple (LCM) task, which contains 2,500 samples and was trained for 2 epochs, Lo RA used learning rates of 3e-4 (r = 2, 8), 4e-4 (r = 16), and 2e-4 (r = 32). FPFT used 4e-6, and Circuit Lo RA used 4e-4 (r = 32) and 6e-5 (r = 64) for Critical Layers, with 3e-4 for non-Critical Layers. Function Evaluation (Function) task, with 5,000 samples trained for 2 epochs, used consistent Lo RA learning rates of 3e-4 (r = 2, 8, 16, 32), FPFT with 8e-6, and Circuit Lo RA with 4e-4 for Critical Layers and 3e-4 for non-Critical Layers.