Domain Adversarial Training: A Game Perspective
Authors: David Acuna, Marc T Law, Guojun Zhang, Sanja Fidler
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that in conjunction with state-of-the-art domain-adversarial methods, we achieve up to 3.5% improvement with less than half of training iterations. Our optimizers are easy to implement, free of additional parameters, and can be plugged into any domain-adversarial framework. |
| Researcher Affiliation | Collaboration | David Acuna123, Marc T. Law2, Guojun Zhang34, Sanja Fidler123 1University of Toronto 2NVIDIA 3Vector Institute 4University of Waterloo |
| Pseudocode | Yes | Algorithm 1 Pseudo-Code of the Proposed Learning Algorithm |
| Open Source Code | Yes | Our algorithm is implemented in Jax Bradbury et al. (2018) (Digits, NLP benchmark) and Py Torch (Visual Task). See also PyTorch pseudo-code in Appendix E. |
| Open Datasets | Yes | This benchmark constitutes of two digits datasets MNIST (CC BY-SA 3.0) and USPS (Le Cun et al., 1998; Long et al., 2018) with two transfer tasks (M → U and U → M). ...Specifically, this analysis is conducted on Visda-2017 benchmark (Peng et al., 2017). ...We also evaluate our approach on natural language processing tasks on the Amazon product reviews dataset (Blitzer et al., 2006). |
| Dataset Splits | Yes | Hyperparameters such as learning rate, learning schedule and adaptation coefficient (λ) are determined for all optimizers by running a dense grid search and selecting the best hyper-parameters on the transfer task M → U. As usual in UDA, the best criteria are determined based on best transfer accuracy. |
| Hardware Specification | Yes | Experiments are conducted on a NVIDIA Titan V and V100 GPU Cards. |
| Software Dependencies | Yes | Our algorithm is implemented in Jax Bradbury et al. (2018) (Digits, NLP benchmark) and Py Torch (Visual Task). ... For this experiment, we use Py Torch (Paszke et al., 2019)... Specifically, we compute the average over 100 runs on a NVIDIA GPU: TITAN V, Cuda Version 10.1 and Py Torch version: 1.5.1. |
| Experiment Setup | Yes | For GD-NM, we use the default momentum value (0.9). We follow the same approach for the additional hyper-parameters of Adam. Hyperparameters such as learning rate, learning schedule and adaptation coefficient (λ) are determined for all optimizers by running a dense grid search and selecting the best hyper-parameters on the transfer task M → U. ... we use Le Net Le Cun et al. (1998) as the backbone architecture, dropout (0.5), fix the batch size to 32, the number of iterations per epoch to 937 and use weight decay (0.005). |