Explaining Probabilistic Models with Distributional Values
Authors: Luca Franceschi, Michele Donini, Cedric Archambeau, Matthias Seeger
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we showcase applications to image classifier and autoregressive language models (Section 5). We train a random forest binary classifier f on the Adult income dataset (Appendix D.5). |
| Researcher Affiliation | Industry | 1Amazon Web Services, Berlin, Germany 2Helsing, Berlin, Germany. Correspondence to: Luca Franceschi <franuluc@amazon.de>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Python code is available at https://github.com/amazon-science/ explaining-probabilistic-models-with-distributinal-values. |
| Open Datasets | Yes | Iris dataset, MNIST (Le Cun et al., 1998), Image Net (Deng et al., 2009), Adult income dataset |
| Dataset Splits | No | For concreteness, we take as running examples the tasks of explaining the output of a logistic multiclass classifier f(x) = Softmax(x W + b) trained on the Iris dataset and the XOR game of Example 3.5. Test images from MNIST (Le Cun et al., 1998) and Image Net (Deng et al., 2009). We train a random forest binary classifier f on the Adult income dataset and compute the Bernoulli Shapley value (BSV) for one misclassified test instance. The paper uses standard datasets but does not specify the exact training/validation/test splits used for reproducibility. |
| Hardware Specification | Yes | We run all the experiments on a machine with 8 Intel(R) Xeon(R) Platinum 8259CL CPUs @ 2.50GHz and one Nvidia(R) Tesla(R) V4 GPU. |
| Software Dependencies | No | Python code is available at https://github.com/amazon-science/ explaining-probabilistic-models-with-distributinal-values. The paper does not specify versions for Python libraries or other software dependencies. |
| Experiment Setup | Yes | To compute both the standard and Categorical SV, we use a simple permutation-based 1000-samples Monte Carlo estimator (Strumbelj & Kononenko, 2010). For out-of-coalition pixels, we use a reference value of 0. We compute average categorical differences between output given prompts with female versus male subject. We restrict the output to a number of tokens in the order of 100 (depending on the sentence), picking a mix of manually selected, most probable (for a GPT2 model) and Chat GPT generated short continuations. |