reproducibilityindex.ai

Optimal ablation for interpretability

Authors: Maximilian Li, Lucas Janson

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then show that using OA produces meaningful improvements for several common downstream applications of measuring component importance. In section 3, we apply OA to algorithmic circuit discovery (Conmy et al., 2023)... In section 4, we use OA to locate relevant components for factual recall (Meng et al., 2022)... In section 5, we apply OA to latent prediction (Belrose et al., 2023a)...
Researcher Affiliation	Academia	Maximilian Li Harvard University Lucas Janson Harvard University
Pseudocode	Yes	Algorithm 1 Uniform gradient sampling
Open Source Code	Yes	All code can be found at https://github.com/maxtli/optimalablation.
Open Datasets	Yes	The Indirect Object Identification (IOI) subtask (Wang et al., 2022)...
Dataset Splits	No	We train OAT on 60% of the dataset and evaluate both methods on the other 40%.
Hardware Specification	Yes	All experiments were run on a single Nvidia A100 GPU with 80GB VRAM.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.y, CUDA z.w).
Experiment Setup	Yes	We use learning rates between 0.01 and 0.15 for the sampling parameters. We use a learning rate of 0.002 for a...