Optimal ablation for interpretability
Authors: Maximilian Li, Lucas Janson
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then show that using OA produces meaningful improvements for several common downstream applications of measuring component importance. In section 3, we apply OA to algorithmic circuit discovery (Conmy et al., 2023)... In section 4, we use OA to locate relevant components for factual recall (Meng et al., 2022)... In section 5, we apply OA to latent prediction (Belrose et al., 2023a)... |
| Researcher Affiliation | Academia | Maximilian Li Harvard University Lucas Janson Harvard University |
| Pseudocode | Yes | Algorithm 1 Uniform gradient sampling |
| Open Source Code | Yes | All code can be found at https://github.com/maxtli/optimalablation. |
| Open Datasets | Yes | The Indirect Object Identification (IOI) subtask (Wang et al., 2022)... |
| Dataset Splits | No | We train OAT on 60% of the dataset and evaluate both methods on the other 40%. |
| Hardware Specification | Yes | All experiments were run on a single Nvidia A100 GPU with 80GB VRAM. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.y, CUDA z.w). |
| Experiment Setup | Yes | We use learning rates between 0.01 and 0.15 for the sampling parameters. We use a learning rate of 0.002 for a... |