reproducibilityindex.ai

Examples are not enough, learn to criticize! Criticism for Interpretability

Authors: Been Kim, Rajiv Khanna, Oluwasanmi O. Koyejo

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classiﬁer, showing competitive performance compared to baselines.
Researcher Affiliation	Collaboration	Been Kim Allen Institute for AI beenkim@csail.mit.edu Rajiv Khanna UT Austin rajivak@utexas.edu Oluwasanmi Koyejo UIUC sanmi@illinois.edu
Pseudocode	Yes	Algorithm 1 Greedy algorithm, max F(S) s.t. \|S\| m
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for the MMD-critic implementation or a link to a code repository.
Open Datasets	Yes	We present results for the proposed technique MMD-critic using USPS hand written digits (Hull, 1994) and Imagenet (Deng et al., 2009) datasets. The USPS hand written digits dataset Hull (1994) consists of n = 7291 training (and 2007 test) greyscale images of 10 handwritten digits from 0 to 9.
Dataset Splits	Yes	The kernel hyperparameter γ was chosen based to maximize the average cross-validated classiﬁcation performance, then ﬁxed for all other experiments.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, memory). It only mentions general concepts like 'image embeddings'.
Software Dependencies	No	The paper mentions using 'radial basis function (RBF) kernel' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) that would be needed for reproducibility.
Experiment Setup	Yes	The kernel hyperparameter γ was chosen based to maximize the average cross-validated classiﬁcation performance, then ﬁxed for all other experiments.