reproducibilityindex.ai

Do Input Gradients Highlight Discriminative Features?

Authors: Harshay Shah, Prateek Jain, Praneeth Netrapalli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	1. We develop an evaluation framework, Diff ROAR, to test assumption (A) on four image classiﬁcation benchmarks. Our results suggest that (i) input gradients of standard models (i.e., trained on original data) may grossly violate (A), whereas (ii) input gradients of adversarially robust models satisfy (A) reasonably well. 2. We then introduce Block MNIST, an MNIST-based semi-real dataset, that by design encodes a priori knowledge of discriminative features. Our analysis on Block MNIST leverages this information to validate as well as characterize differences between input gradient attributions of standard and robust models. 3. Finally, we theoretically prove that our empirical ﬁndings hold on a simpliﬁed version of the Block MNIST dataset. Speciﬁcally, we prove that input gradients of standard one-hidden-layer MLPs trained on this dataset do not highlight instance-speciﬁc signal coordinates, thus grossly violating (A).
Researcher Affiliation	Industry	Harshay Shah Microsoft Research India harshay@google.com Prateek Jain Microsoft Research India prajain@google.com Praneeth Netrapalli Microsoft Research India pnetrapalli@google.com Part of the work completed after joining Google Research India
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We believe that the Diff ROAR framework and Block MNIST datasets serve as sanity checks to audit interpretability methods; code and data available at https://github.com/harshays/inputgradients.
Open Datasets	Yes	We consider four benchmark image classiﬁcation datasets: SVHN [38], Fashion MNIST [39], CIFAR-10 [40] and Image Net-10 [41]. Image Net-10 is an open-sourced variant (https://github.com/Madry Lab/robustness/) of Imagenet [41]... Our code, along with the proposed datasets, is publicly available at https://github.com/harshays/inputgradients.
Dataset Splits	No	The paper mentions 'unmasked train and test datasets' but does not explicitly provide details about a validation set or its split percentages/counts.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud compute specifications).
Software Dependencies	No	The paper mentions using MLPs, CNNs, and Resnets, along with PGD adversarial training, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Unless mentioned otherwise, we train models using stochastic gradient descent (SGD), with momentum 0.9, batch size 256, ℓ2 regularization 0.0005 and initial learning rate 0.1 that decays by a factor of 0.75 every 20 epochs. Additionally, we use standard data augmentation and train models for at most 500 epochs, stopping early if cross-entropy loss on training data goes below 0.001.