Deep Transfer Learning with Joint Adaptation Networks
Authors: Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments testify that our model yields state of the art results on standard datasets. |
| Researcher Affiliation | Academia | 1Key Lab for Information System Security, MOE; Tsinghua National Lab for Information Science and Technology (TNList); NELBDS; School of Software, Tsinghua University, Beijing 100084, China 2University of California, Berkeley, Berkeley 94720. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and datasets are available at http://github.com/thuml. |
| Open Datasets | Yes | Office-31 (Saenko et al., 2010) is a standard benchmark for domain adaptation in computer vision... and Image CLEF-DA1 is a benchmark dataset for Image CLEF 2014 domain adaptation challenge... 1http://imageclef.org/2014/adaptation |
| Dataset Splits | Yes | We perform model selection by tuning hyper-parameters using transfer cross-validation (Zhong et al., 2010). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments. |
| Software Dependencies | No | The paper mentions 'Caffe framework' but does not provide specific version numbers for Caffe or any other software dependencies. |
| Experiment Setup | Yes | We use mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate annealing strategy in Rev Grad (Ganin & Lempitsky, 2015): the learning rate is not selected by a grid search due to high computational cost it is adjusted during SGD using the following formula: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75, which is optimized to promote convergence and low error on the source domain. To suppress noisy activations at the early stages of training, instead of fixing the adaptation factor λ, we gradually change it from 0 to 1 by a progressive schedule: λp = 2 1+exp( γp) 1, and γ = 10 is fixed throughout experiments (Ganin & Lempitsky, 2015). |