reproducibilityindex.ai

Unsupervised Domain Adaptation with Residual Transfer Networks

Authors: Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks.
Researcher Affiliation	Academia	KLiss, MOE; TNList; School of Software, Tsinghua University, China University of California, Berkeley, Berkeley, USA {mingsheng,jimwang}@tsinghua.edu.cn, zhuhan10@gmail.com, jordan@berkeley.edu
Pseudocode	No	The paper describes algorithms and formulations using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Codes and datasets will be available at https://github.com/thuml/transfer-caffe.
Open Datasets	Yes	Ofﬁce-31 [13] is a benchmark for domain adaptation... Ofﬁce-Caltech [14] is built by selecting the 10 common categories shared by Ofﬁce-31 and Caltech256 (C).
Dataset Splits	Yes	We follow standard protocols and use all labeled source data and all unlabeled target data for domain adaptation [5]. For all methods, we perform cross-valuation on labeled source data to select candidate parameters, then conduct validation on transfer task A W by requiring one labeled example per category from target domain W as the validation set, and ﬁx the selected parameters throughout all transfer tasks.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	We implement all deep methods based on the Caffe deep-learning framework, and ﬁne-tune from Caffe-provided models of Alex Net [26] pre-trained on Image Net. While Caffe and Alex Net are mentioned, specific version numbers for these software components are not provided.
Experiment Setup	Yes	We set their learning rate to be 10 times that of the other layers. We use mini-batch stochastic gradient descent (SGD) with momentum of 0.9 and the learning rate annealing strategy implemented in Rev Grad [6]: the learning rate is not selected through a grid search due to high computational cost it is adjusted during SGD using the following formula: ηp = η0 (1+αp)β , where p is the training progress linearly changing from 0 to 1, η0 = 0.01, α = 10 and β = 0.75...the MMD penalty parameter λ and entropy penalty γ are ﬁrst selected on A W and then ﬁxed as λ = 0.3, γ = 0.3 for all other transfer tasks.