Decomposing and Editing Predictions by Modeling Model Computation
Authors: Harshay Shah, Andrew Ilyas, Aleksander Madry
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on both image classifiers and language models, we show that COAR yields component attributions that can accurately predict how model predictions change in response to component-level ablations (Section 4). We use COAR to obtain component attributions (one for each test example) in each setup. Specifically, for a given model, we first construct a component dataset D(z) for each example z (as in Step 1 of Section 3) by randomly ablating αtrain fraction of all components and evaluating the resulting margin (5) on z, where αtrain = {10%, 5%, 5%} for setup {A, B, C} above. |
| Researcher Affiliation | Academia | Harshay Shah 1 Andrew Ilyas 1 Aleksander M adry 1 1MIT. Correspondence to: Harshay Shah <harshay@mit.edu>. |
| Pseudocode | Yes | We provide pseudocode for COAR in Appendix E.1. Figure 8: Pseudocode for estimating component attributions with COAR. |
| Open Source Code | Yes | Our code is available at github.com/Madry Lab/modelcomponents. |
| Open Datasets | Yes | Setup A: A Res Net-18 (He et al., 2015) trained on the CIFAR-10 dataset (Krizhevsky, 2009), with a computation graph GA comprising |C| = 2, 306 components. Setup B: A Res Net-50 trained on the Image Net dataset (Deng et al., 2009), with a computation graph GB comprising |C| = 22, 720 components. Setup C: A Vision Transformer (Vi T-B/16) (Dosovitskiy et al., 2021) trained on Image Net, whose computation graph GC comprises 82, 944 components. |
| Dataset Splits | Yes | Figure 3c shows that we can individually fix every misclassification in the Image Net validation set while incurring a median accuracy drop of 0.2% on the training set (top row) and validation set (bottom row). We use a validation set comprising examples with and without the synthetic attack to select the number of components to ablate from the model. |
| Hardware Specification | Yes | We train our models and compute COAR attributions on a cluster of machines, each with 9 NVIDIA A100 or V100 GPUs and 96 CPU cores. |
| Software Dependencies | No | The paper mentions using 'captum library', 'FFCV library', and 'fast-l1 package' but does not specify their version numbers or other core software dependencies like Python or PyTorch versions. |
| Experiment Setup | Yes | Specifically, for a given model, we first construct a component dataset D(z) for each example z (as in Step 1 of Section 3) by randomly ablating αtrain fraction of all components and evaluating the resulting margin (5) on z, where αtrain = {10%, 5%, 5%} for setup {A, B, C} above. We repeat this m times, yielding a component dataset D(z) of size m for each example z we use m = {50000, 100000, 200000} for setup {A, B, C} above. We choose to ablate component subsets C S by simply setting the parameters of the components in C to zero (Wang et al., 2022; Olsson et al., 2022). |