Examples are not enough, learn to criticize! Criticism for Interpretability
Authors: Been Kim, Rajiv Khanna, Oluwasanmi O. Koyejo
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines. |
| Researcher Affiliation | Collaboration | Been Kim Allen Institute for AI beenkim@csail.mit.edu Rajiv Khanna UT Austin rajivak@utexas.edu Oluwasanmi Koyejo UIUC sanmi@illinois.edu |
| Pseudocode | Yes | Algorithm 1 Greedy algorithm, max F(S) s.t. |S| m |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the MMD-critic implementation or a link to a code repository. |
| Open Datasets | Yes | We present results for the proposed technique MMD-critic using USPS hand written digits (Hull, 1994) and Imagenet (Deng et al., 2009) datasets. The USPS hand written digits dataset Hull (1994) consists of n = 7291 training (and 2007 test) greyscale images of 10 handwritten digits from 0 to 9. |
| Dataset Splits | Yes | The kernel hyperparameter γ was chosen based to maximize the average cross-validated classification performance, then fixed for all other experiments. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, memory). It only mentions general concepts like 'image embeddings'. |
| Software Dependencies | No | The paper mentions using 'radial basis function (RBF) kernel' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | The kernel hyperparameter γ was chosen based to maximize the average cross-validated classification performance, then fixed for all other experiments. |