Saliency is a Possible Red Herring When Diagnosing Poor Generalization
Authors: Joseph D Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert has labelled as important and generalization performance. and 4 EXPERIMENTS. |
| Researcher Affiliation | Collaboration | Joseph D. Viviano1,2, , Becks Simpson1, Francis Dutil2, Yoshua Bengio1,3, & Joseph Paul Cohen1, 1Mila, Qu ebec Artificial Intelligence Institute, Universit e de Montr eal 2Imagia Cybernetics 3CIFAR Senior Fellow |
| Pseudocode | No | The paper does not contain explicit pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All code and datasets for this paper are publicly available1. 1https://github.com/josephdviviano/saliency-red-herring |
| Open Datasets | Yes | We generated a dataset with these conditions where class membership is denoted by the number of crosses present in the image (1 vs. 2 crosses)...; We introduced a covariate-shift into a joint dataset of X-Rays drawn from two different imaging centres: the Pad Chest (Bustos et al., 2019) sample and the NIH Chestx-Ray8 (Wang et al., 2017) sample.; Here we made use of the RSNA Pneumonia challenge (Shih et al., 2019) dataset... |
| Dataset Splits | Yes | This dataset had 500 training, 128 validation, and 128 test examples respectively.; Ntrain = 5542, Nvalid = 2770, Ntest = 2770.; Ntrain = 2696, Nvalid = 1348, Ntest = 1348. |
| Hardware Specification | No | The paper states: 'This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec.' but does not provide specific hardware details such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using Torchvision, Torch XRay Vision library, and Captum library, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Hyperparamters were selected for all models using a Bayesian hyperparameter search with a single seed, 5 random initializations, 20 iterations, and trained for a maximum of 100 epochs with a patience of 20. The learning rate for all models was searched on a log uniform scale between [10 5 10 2]. The Act Diff, Adversarial, Grad Mask, and RRR lambdas were all searched between [10 4 10], each on a log uniform scale. For Adversarial, we searched for the optimal discriminator : encoder training ratio of [2 : 1 10 : 1], and the discriminator learning rate was searched on a log uniform scale between [10 4 10 2]. For all experiments, the batch size was 16, weight of the classification cross entropy loss was 1. Table 3: Best hyperparameters found using for each hyperparameter search. |