reproducibilityindex.ai

Agree to Disagree: Diversity through Disagreement for Better Transferability

Authors: Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability. 4 EXPERIMENTS We conduct two main types of experiments, (i) we evaluate how D-BAT can mitigate shortcut learning, bypassing simplicity bias, and generalize to OOD distributions, and (ii) we test the uncertainty estimation and OOD detection capabilities of D-BAT models.
Researcher Affiliation	Academia	Matteo Pagliardini EPFL Martin Jaggi EPFL François Fleuret EPFL Sai Praneeth Karimireddy EPFL & UC Berkeley
Pseudocode	Yes	Algorithm 1 D-BAT for binary classification ... Algorithm 2 D-BAT for multi-class classification
Open Source Code	Yes	Link to the source code to reproduce our experiments: https://github.com/mpagli/Agree-to-Disagree
Open Datasets	Yes	The Colored-MNIST, or C-MNIST for short, consists of MNIST (Lecun & Cortes, 1998) images... Fashion-MNIST (Xiao et1 al., 2017)... CIFAR-10 (Krizhevsky, 2009)... We use the Waterbirds (Sagawa et al., 2020) and Camelyon17 (Bandi et al., 2018) datasets from the WILDS collection (Koh et al., 2021). We also use the Office-Home dataset from Venkateswara et al. (2017).
Dataset Splits	Yes	For the Camelyon17 medical imaging dataset, we use unlabeled validation data instead of unlabeled test data, both coming from different hospitals. For the Office-Home dataset, we use the left-out Art domain as Dood. ... We use the train/ validation/test splits provided by the WILDS library. ... For our D-BAT experiments we only consider the case where we have access to unlabeled target data. We use the validation split as it is from the same distribution as the target data.
Hardware Specification	Yes	For the Camelyon17, Waterbirds and Office-Home datasets, which use a Res Net-50 or Res Net-18 architectures, we used a V100 Nvidia GPU and the hyperparameter search and training took about two weeks.
Software Dependencies	No	The paper mentions using 'Adam W optimizer' and 'SGD' and 'Res Net-18', 'Res Net-50', 'Le Net' models, but it does not specify version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow) or specific software tools used to implement or run the experiments.
Experiment Setup	Yes	We train for 60 epochs with a fixed learning rate of 0.001 with and SGD as optimizer. We use an l2 penalty term of 0.0001 and a momentum term β = 0.9. For D-BAT, we tune α {10 1, 10 2, 10 3, 10 4, 10 5, 10 6} and found α = 10 6 to be best. For each set of hyperparameters, we train a deep-ensemble and a D-BAT ensemble of size 2, and select the parameters associated with the highest averaged validation accuracy over the two predictors of the ensemble. Our results are obtained by averaging over 3 seeds.