Scaling Symbolic Methods using Gradients for Neural Model Explanation

Authors: Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our technique on three datasets MNIST, Image Net, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone.
Researcher Affiliation Industry Google Research {subhamsahoo,vsubhashini,leeley,rising,pfr}@google.com
Pseudocode No The paper describes the methodology using mathematical equations and descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and examples are at - https://github.com/google-research/google-research/tree/master/smug_saliency
Open Datasets Yes Datasets. We empirically evaluate SMUG on two image datasets, MNIST (Le Cun et al., 2010), and Image Net (Deng et al., 2009), as well as a text dataset of Beer Reviews from (Mc Auley et al., 2012).
Dataset Splits Yes Beer Reviews: To evaluate SMUG on a textual task we consider the review rating prediction task on the Beer Reviews dataset2 consisting of 70k training examples, 3k validation and 7k test examples. Image Net: We use 3304 images (224 224) with ground truth bounding boxes from the validation set of Image Net. MNIST: For 100 images chosen randomly from the validation set, the SMT solver could solve the constraint shown in Eq. 6 (returns SAT) for only 41 of the images.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'z3 solver' as a tool used but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We use a feedforward model consisting of one hidden layer with 32 nodes (Re LU activation) and 10 output nodes with sigmoid, one each for 10 digits (0 9). ... we set k = 3000, γ = 0 for Image Net, and k = 100, γ = 0 for text experiments... Further, each masking variable Mij is used to represent a 4 4 grid of pixels instead of a single pixel (to reduce running time).