Decomposing and Editing Predictions by Modeling Model Computation

Authors: Harshay Shah, Andrew Ilyas, Aleksander Madry

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on both image classifiers and language models, we show that COAR yields component attributions that can accurately predict how model predictions change in response to component-level ablations (Section 4). We use COAR to obtain component attributions (one for each test example) in each setup. Specifically, for a given model, we first construct a component dataset D(z) for each example z (as in Step 1 of Section 3) by randomly ablating αtrain fraction of all components and evaluating the resulting margin (5) on z, where αtrain = {10%, 5%, 5%} for setup {A, B, C} above.
Researcher Affiliation Academia Harshay Shah 1 Andrew Ilyas 1 Aleksander M adry 1 1MIT. Correspondence to: Harshay Shah <harshay@mit.edu>.
Pseudocode Yes We provide pseudocode for COAR in Appendix E.1. Figure 8: Pseudocode for estimating component attributions with COAR.
Open Source Code Yes Our code is available at github.com/Madry Lab/modelcomponents.
Open Datasets Yes Setup A: A Res Net-18 (He et al., 2015) trained on the CIFAR-10 dataset (Krizhevsky, 2009), with a computation graph GA comprising |C| = 2, 306 components. Setup B: A Res Net-50 trained on the Image Net dataset (Deng et al., 2009), with a computation graph GB comprising |C| = 22, 720 components. Setup C: A Vision Transformer (Vi T-B/16) (Dosovitskiy et al., 2021) trained on Image Net, whose computation graph GC comprises 82, 944 components.
Dataset Splits Yes Figure 3c shows that we can individually fix every misclassification in the Image Net validation set while incurring a median accuracy drop of 0.2% on the training set (top row) and validation set (bottom row). We use a validation set comprising examples with and without the synthetic attack to select the number of components to ablate from the model.
Hardware Specification Yes We train our models and compute COAR attributions on a cluster of machines, each with 9 NVIDIA A100 or V100 GPUs and 96 CPU cores.
Software Dependencies No The paper mentions using 'captum library', 'FFCV library', and 'fast-l1 package' but does not specify their version numbers or other core software dependencies like Python or PyTorch versions.
Experiment Setup Yes Specifically, for a given model, we first construct a component dataset D(z) for each example z (as in Step 1 of Section 3) by randomly ablating αtrain fraction of all components and evaluating the resulting margin (5) on z, where αtrain = {10%, 5%, 5%} for setup {A, B, C} above. We repeat this m times, yielding a component dataset D(z) of size m for each example z we use m = {50000, 100000, 200000} for setup {A, B, C} above. We choose to ablate component subsets C S by simply setting the parameters of the components in C to zero (Wang et al., 2022; Olsson et al., 2022).