Unsupervised Domain Adaptation with Residual Transfer Networks

Authors: Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks.
Researcher Affiliation Academia KLiss, MOE; TNList; School of Software, Tsinghua University, China University of California, Berkeley, Berkeley, USA {mingsheng,jimwang}@tsinghua.edu.cn, zhuhan10@gmail.com, jordan@berkeley.edu
Pseudocode No The paper describes algorithms and formulations using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Codes and datasets will be available at https://github.com/thuml/transfer-caffe.
Open Datasets Yes Office-31 [13] is a benchmark for domain adaptation... Office-Caltech [14] is built by selecting the 10 common categories shared by Office-31 and Caltech256 (C).
Dataset Splits Yes We follow standard protocols and use all labeled source data and all unlabeled target data for domain adaptation [5]. For all methods, we perform cross-valuation on labeled source data to select candidate parameters, then conduct validation on transfer task A W by requiring one labeled example per category from target domain W as the validation set, and fix the selected parameters throughout all transfer tasks.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No We implement all deep methods based on the Caffe deep-learning framework, and fine-tune from Caffe-provided models of Alex Net [26] pre-trained on Image Net. While Caffe and Alex Net are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes We set their learning rate to be 10 times that of the other layers. We use mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate annealing strategy implemented in Rev Grad [6]: the learning rate is not selected through a grid search due to high computational cost it is adjusted during SGD using the following formula: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75...the MMD penalty parameter λ and entropy penalty γ are first selected on A W and then fixed as λ = 0.3, γ = 0.3 for all other transfer tasks.