Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimal ablation for interpretability
Authors: Maximilian Li, Lucas Janson
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then show that using OA produces meaningful improvements for several common downstream applications of measuring component importance. In section 3, we apply OA to algorithmic circuit discovery (Conmy et al., 2023)... In section 4, we use OA to locate relevant components for factual recall (Meng et al., 2022)... In section 5, we apply OA to latent prediction (Belrose et al., 2023a)... |
| Researcher Affiliation | Academia | Maximilian Li Harvard University Lucas Janson Harvard University |
| Pseudocode | Yes | Algorithm 1 Uniform gradient sampling |
| Open Source Code | Yes | All code can be found at https://github.com/maxtli/optimalablation. |
| Open Datasets | Yes | The Indirect Object Identification (IOI) subtask (Wang et al., 2022)... |
| Dataset Splits | No | We train OAT on 60% of the dataset and evaluate both methods on the other 40%. |
| Hardware Specification | Yes | All experiments were run on a single Nvidia A100 GPU with 80GB VRAM. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.y, CUDA z.w). |
| Experiment Setup | Yes | We use learning rates between 0.01 and 0.15 for the sampling parameters. We use a learning rate of 0.002 for a... |