Learning Transferable Features with Deep Adaptation Networks

Authors: Mingsheng Long, Yue Cao, Jianmin Wang, Michael Jordan

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.
Researcher Affiliation Academia School of Software, TNList Lab for Info. Sci. & Tech., Institute for Data Science, Tsinghua University, China Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA
Pseudocode No The paper describes the algorithm steps in prose within Section 3.2 but does not provide structured pseudocode or a formal algorithm block.
Open Source Code No The paper mentions implementing CNN-based methods 'based on the Caffe (Jia et al., 2014) implementation of Alex Net (Krizhevsky et al., 2012)' but does not provide an explicit statement or link for the open-sourcing of their own Deep Adaptation Network (DAN) code.
Open Datasets Yes Office-31 (Saenko et al., 2010) This dataset is a standard benchmark for domain adaptation... Office-10 + Caltech-10 (Gong et al., 2012). This dataset consists of the 10 common categories shared by the Office-31 and Caltech-256 (C) (Griffin et al., 2007) datasets and is widely adopted in transfer learning methods (Long et al., 2013; Baktashmotlagh et al., 2013).
Dataset Splits No The paper states, 'we can automatically select the MMD penalty parameter λ on a validation set (comprised of source-labeled instances and target-unlabeled instances) by jointly assessing the test errors of the source classifier and the two-sample classifier.' However, it does not specify the size or exact split methodology for this validation set.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions the use of 'Caffe' and 'Alex Net' and cites the papers for these frameworks, but it does not provide specific version numbers for these software dependencies as required for reproducibility.
Experiment Setup Yes We use the finetuning architecture (Yosinski et al., 2014)... we fix convolutional layers conv1 conv3 that were copied from pretrained model, fine-tune conv4 conv5 and fully connected layers fc6 fc7, and train classifier layer fc8... we set its learning rate to be 10 times that of the lower layers. We use stochastic gradient descent (SGD) with 0.9 momentum and the learning rate annealing strategy implemented in Caffe, and cross-validate base learning rate between 10 5 and 10 2 with a multiplicative step-size 101/2. ...we use a Gaussian kernel k (xi, xj) = e xi xj 2/γ with the bandwidth γ set to the median pairwise distances on the training data the median heuristic... We use multi-kernel MMD for DAN, and consider a family of m Gaussian kernels {ku}m u=1 by varying bandwidth γu between 2 8γ and 28γ with a multiplicative step-size of 21/2.