Multi-Adversarial Domain Adaptation

Authors: Zhongyi Pei, Zhangjie Cao, Mingsheng Long, Jianmin Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence demonstrates that the proposed model outperforms state of the art methods on standard domain adaptation datasets.
Researcher Affiliation Academia Zhongyi Pei, Zhangjie Cao, Mingsheng Long, Jianmin Wang KLiss, MOE; NEL-BDS; TNList; School of Software, Tsinghua University, China {peizhyi,caozhangjie14}@gmail.com {mingsheng,jimwang}@tsinghua.edu.cn
Pseudocode No The information is insufficient. The paper includes an architecture diagram (Figure 2) but no structured pseudocode or algorithm blocks.
Open Source Code Yes The codes, datasets and configurations will be available online at github.com/thuml.
Open Datasets Yes Office-31 (Saenko et al. 2010) is a standard benchmark for visual domain adaptation... and Image CLEF-DA1 is a benchmark dataset for Image CLEF 2014 domain adaptation challenge... 1http://imageclef.org/2014/adaptation
Dataset Splits Yes We follow standard evaluation protocols for unsupervised domain adaptation (Long et al. 2015; Ganin and Lempitsky 2015). For both Office-31 and Image CLEF-DA datasets, we use all labeled source examples and all unlabeled target examples. ... We also adopt transfer cross-validation (Zhong et al. 2010) to select parameter λ for the MADA models.
Hardware Specification No The information is insufficient. The paper does not specify any particular hardware components like GPU or CPU models used for running the experiments.
Software Dependencies No The information is insufficient. The paper mentions 'Caffe (Jia et al. 2014) framework' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes We employ the mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate strategy implemented in Rev Grad (Ganin and Lempitsky 2015): the learning rate is not selected by a grid search due to high computational cost it is adjusted during SGD using these formulas: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75... To suppress noisy activations at the early stages of training, instead of fixing parameter λ, we gradually change it by multiplying 2 1+exp( δp) 1, where δ = 10 (Ganin and Lempitsky 2015).