Agree to Disagree: Diversity through Disagreement for Better Transferability
Authors: Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability. 4 EXPERIMENTS We conduct two main types of experiments, (i) we evaluate how D-BAT can mitigate shortcut learning, bypassing simplicity bias, and generalize to OOD distributions, and (ii) we test the uncertainty estimation and OOD detection capabilities of D-BAT models. |
| Researcher Affiliation | Academia | Matteo Pagliardini EPFL Martin Jaggi EPFL François Fleuret EPFL Sai Praneeth Karimireddy EPFL & UC Berkeley |
| Pseudocode | Yes | Algorithm 1 D-BAT for binary classification ... Algorithm 2 D-BAT for multi-class classification |
| Open Source Code | Yes | Link to the source code to reproduce our experiments: https://github.com/mpagli/Agree-to-Disagree |
| Open Datasets | Yes | The Colored-MNIST, or C-MNIST for short, consists of MNIST (Lecun & Cortes, 1998) images... Fashion-MNIST (Xiao et1 al., 2017)... CIFAR-10 (Krizhevsky, 2009)... We use the Waterbirds (Sagawa et al., 2020) and Camelyon17 (Bandi et al., 2018) datasets from the WILDS collection (Koh et al., 2021). We also use the Office-Home dataset from Venkateswara et al. (2017). |
| Dataset Splits | Yes | For the Camelyon17 medical imaging dataset, we use unlabeled validation data instead of unlabeled test data, both coming from different hospitals. For the Office-Home dataset, we use the left-out Art domain as Dood. ... We use the train/ validation/test splits provided by the WILDS library. ... For our D-BAT experiments we only consider the case where we have access to unlabeled target data. We use the validation split as it is from the same distribution as the target data. |
| Hardware Specification | Yes | For the Camelyon17, Waterbirds and Office-Home datasets, which use a Res Net-50 or Res Net-18 architectures, we used a V100 Nvidia GPU and the hyperparameter search and training took about two weeks. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'SGD' and 'Res Net-18', 'Res Net-50', 'Le Net' models, but it does not specify version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow) or specific software tools used to implement or run the experiments. |
| Experiment Setup | Yes | We train for 60 epochs with a fixed learning rate of 0.001 with and SGD as optimizer. We use an l2 penalty term of 0.0001 and a momentum term β = 0.9. For D-BAT, we tune α {10 1, 10 2, 10 3, 10 4, 10 5, 10 6} and found α = 10 6 to be best. For each set of hyperparameters, we train a deep-ensemble and a D-BAT ensemble of size 2, and select the parameters associated with the highest averaged validation accuracy over the two predictors of the ensemble. Our results are obtained by averaging over 3 seeds. |