reproducibilityindex.ai

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

Authors: Joseph D Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert has labelled as important and generalization performance. and 4 EXPERIMENTS.
Researcher Affiliation	Collaboration	Joseph D. Viviano1,2, , Becks Simpson1, Francis Dutil2, Yoshua Bengio1,3, & Joseph Paul Cohen1, 1Mila, Qu ebec Artiﬁcial Intelligence Institute, Universit e de Montr eal 2Imagia Cybernetics 3CIFAR Senior Fellow
Pseudocode	No	The paper does not contain explicit pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All code and datasets for this paper are publicly available1. 1https://github.com/josephdviviano/saliency-red-herring
Open Datasets	Yes	We generated a dataset with these conditions where class membership is denoted by the number of crosses present in the image (1 vs. 2 crosses)...; We introduced a covariate-shift into a joint dataset of X-Rays drawn from two different imaging centres: the Pad Chest (Bustos et al., 2019) sample and the NIH Chestx-Ray8 (Wang et al., 2017) sample.; Here we made use of the RSNA Pneumonia challenge (Shih et al., 2019) dataset...
Dataset Splits	Yes	This dataset had 500 training, 128 validation, and 128 test examples respectively.; Ntrain = 5542, Nvalid = 2770, Ntest = 2770.; Ntrain = 2696, Nvalid = 1348, Ntest = 1348.
Hardware Specification	No	The paper states: 'This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec.' but does not provide specific hardware details such as GPU or CPU models.
Software Dependencies	No	The paper mentions using Torchvision, Torch XRay Vision library, and Captum library, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Hyperparamters were selected for all models using a Bayesian hyperparameter search with a single seed, 5 random initializations, 20 iterations, and trained for a maximum of 100 epochs with a patience of 20. The learning rate for all models was searched on a log uniform scale between [10 5 10 2]. The Act Diff, Adversarial, Grad Mask, and RRR lambdas were all searched between [10 4 10], each on a log uniform scale. For Adversarial, we searched for the optimal discriminator : encoder training ratio of [2 : 1 10 : 1], and the discriminator learning rate was searched on a log uniform scale between [10 4 10 2]. For all experiments, the batch size was 16, weight of the classiﬁcation cross entropy loss was 1. Table 3: Best hyperparameters found using for each hyperparameter search.