Deep Transfer Learning with Joint Adaptation Networks

Authors: Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments testify that our model yields state of the art results on standard datasets.
Researcher Affiliation Academia 1Key Lab for Information System Security, MOE; Tsinghua National Lab for Information Science and Technology (TNList); NELBDS; School of Software, Tsinghua University, Beijing 100084, China 2University of California, Berkeley, Berkeley 94720.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Codes and datasets are available at http://github.com/thuml.
Open Datasets Yes Office-31 (Saenko et al., 2010) is a standard benchmark for domain adaptation in computer vision... and Image CLEF-DA1 is a benchmark dataset for Image CLEF 2014 domain adaptation challenge... 1http://imageclef.org/2014/adaptation
Dataset Splits Yes We perform model selection by tuning hyper-parameters using transfer cross-validation (Zhong et al., 2010).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments.
Software Dependencies No The paper mentions 'Caffe framework' but does not provide specific version numbers for Caffe or any other software dependencies.
Experiment Setup Yes We use mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate annealing strategy in Rev Grad (Ganin & Lempitsky, 2015): the learning rate is not selected by a grid search due to high computational cost it is adjusted during SGD using the following formula: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75, which is optimized to promote convergence and low error on the source domain. To suppress noisy activations at the early stages of training, instead of fixing the adaptation factor λ, we gradually change it from 0 to 1 by a progressive schedule: λp = 2 1+exp( γp) 1, and γ = 10 is fixed throughout experiments (Ganin & Lempitsky, 2015).