Unsupervised Domain Adaptation with Residual Transfer Networks
Authors: Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks. |
| Researcher Affiliation | Academia | KLiss, MOE; TNList; School of Software, Tsinghua University, China University of California, Berkeley, Berkeley, USA {mingsheng,jimwang}@tsinghua.edu.cn, zhuhan10@gmail.com, jordan@berkeley.edu |
| Pseudocode | No | The paper describes algorithms and formulations using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and datasets will be available at https://github.com/thuml/transfer-caffe. |
| Open Datasets | Yes | Office-31 [13] is a benchmark for domain adaptation... Office-Caltech [14] is built by selecting the 10 common categories shared by Office-31 and Caltech256 (C). |
| Dataset Splits | Yes | We follow standard protocols and use all labeled source data and all unlabeled target data for domain adaptation [5]. For all methods, we perform cross-valuation on labeled source data to select candidate parameters, then conduct validation on transfer task A W by requiring one labeled example per category from target domain W as the validation set, and fix the selected parameters throughout all transfer tasks. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | We implement all deep methods based on the Caffe deep-learning framework, and fine-tune from Caffe-provided models of Alex Net [26] pre-trained on Image Net. While Caffe and Alex Net are mentioned, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | We set their learning rate to be 10 times that of the other layers. We use mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate annealing strategy implemented in Rev Grad [6]: the learning rate is not selected through a grid search due to high computational cost it is adjusted during SGD using the following formula: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75...the MMD penalty parameter λ and entropy penalty γ are first selected on A W and then fixed as λ = 0.3, γ = 0.3 for all other transfer tasks. |