Domain Adversarial Training: A Game Perspective

Authors: David Acuna, Marc T Law, Guojun Zhang, Sanja Fidler

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that in conjunction with state-of-the-art domain-adversarial methods, we achieve up to 3.5% improvement with less than half of training iterations. Our optimizers are easy to implement, free of additional parameters, and can be plugged into any domain-adversarial framework.
Researcher Affiliation Collaboration David Acuna123, Marc T. Law2, Guojun Zhang34, Sanja Fidler123 1University of Toronto 2NVIDIA 3Vector Institute 4University of Waterloo
Pseudocode Yes Algorithm 1 Pseudo-Code of the Proposed Learning Algorithm
Open Source Code Yes Our algorithm is implemented in Jax Bradbury et al. (2018) (Digits, NLP benchmark) and Py Torch (Visual Task). See also PyTorch pseudo-code in Appendix E.
Open Datasets Yes This benchmark constitutes of two digits datasets MNIST (CC BY-SA 3.0) and USPS (Le Cun et al., 1998; Long et al., 2018) with two transfer tasks (M → U and U → M). ...Specifically, this analysis is conducted on Visda-2017 benchmark (Peng et al., 2017). ...We also evaluate our approach on natural language processing tasks on the Amazon product reviews dataset (Blitzer et al., 2006).
Dataset Splits Yes Hyperparameters such as learning rate, learning schedule and adaptation coefficient (λ) are determined for all optimizers by running a dense grid search and selecting the best hyper-parameters on the transfer task M → U. As usual in UDA, the best criteria are determined based on best transfer accuracy.
Hardware Specification Yes Experiments are conducted on a NVIDIA Titan V and V100 GPU Cards.
Software Dependencies Yes Our algorithm is implemented in Jax Bradbury et al. (2018) (Digits, NLP benchmark) and Py Torch (Visual Task). ... For this experiment, we use Py Torch (Paszke et al., 2019)... Specifically, we compute the average over 100 runs on a NVIDIA GPU: TITAN V, Cuda Version 10.1 and Py Torch version: 1.5.1.
Experiment Setup Yes For GD-NM, we use the default momentum value (0.9). We follow the same approach for the additional hyper-parameters of Adam. Hyperparameters such as learning rate, learning schedule and adaptation coefficient (λ) are determined for all optimizers by running a dense grid search and selecting the best hyper-parameters on the transfer task M → U. ... we use Le Net Le Cun et al. (1998) as the backbone architecture, dropout (0.5), fix the batch size to 32, the number of iterations per epoch to 937 and use weight decay (0.005).