Domain Adaptive Multibranch Networks

Authors: Róger Bermúdez-Chacón, Mathieu Salzmann, Pascal Fua

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that our Domain Adaptive Multibranch Networks, which we will refer to as DAMNets, not only outperform the original technique of Ganin & Lempitsky (2015), but also the state-of-the-art strategy for untying the source and target weights of Rozantsev et al. (2019), which relies on the same domain classifier. We evaluate our method in the task of image recognition for which we use several domain adaptation benchmark problems: Digits, which comprises three domains: MNIST (Le Cun et al., 1998), MNIST-M (Ganin & Lempitsky, 2015), and SVHN (Netzer et al., 2011); Office (Saenko et al., 2010), which contains three domains: Amazon, DSLR, and Webcam; Office-Home (Venkateswara et al., 2017), with domains Art, Clipart, Product, and Real; and Vis DA17 (Peng et al., 2018), with Synthetic and Real images. The results for the digit recognition and Office-Home datasets are provided in Table 1.
Researcher Affiliation Academia R oger Berm udez-Chac on, Mathieu Salzmann, & Pascal Fua Computer Vision Laboratory Ecole Polytechnique F ed erale de Lausanne Station 14, CH-1015 Lausanne, Switzerland {roger.bermudez,mathieu.salzmann,pascal.fua}@epfl.ch
Pseudocode No The paper provides architectural diagrams (e.g., Figures 1, 2, 3, 4) but does not include formal pseudocode or algorithm blocks.
Open Source Code No We will make our code publicly available upon acceptance of the paper.
Open Datasets Yes Digits, which comprises three domains: MNIST (Le Cun et al., 1998), MNIST-M (Ganin & Lempitsky, 2015), and SVHN (Netzer et al., 2011); Office (Saenko et al., 2010), which contains three domains: Amazon, DSLR, and Webcam; Office-Home (Venkateswara et al., 2017), with domains Art, Clipart, Product, and Real; and Vis DA17 (Peng et al., 2018), with Synthetic and Real images. We evaluate our method for the detection of drones from video frames, on the UAV-200 dataset (Rozantsev et al., 2018).
Dataset Splits Yes The standard training and testing splits contain 60,000 and 10,000 examples, respectively. (for MNIST) We perform this training on the predefined training splits, when available, or on 75% of the images, otherwise. Our On TD oracle is trained on either the preset training splits, when available, or our defined training data, and evaluated on the corresponding test data. For the comparison to this oracle to be meaningful, we follow the same strategy for our DAMNets. That is, we use the unlabeled target data from the training splits only and report results on the testing splits. We evaluate performance on a validation set comprising 3,000 positive and 135,000 negative patches.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory sizes) used for running the experiments.
Software Dependencies No The paper mentions software components like "Stochastic Gradient Descent" and "Ada Delta as optimizer" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We initialize our networks parameters by training the original versions of those architectures on the source domains, either from scratch, for simple architectures, or by fine-tuning weights learned on Image Net, for very deep ones. The initial values of the gate parameters are defined so as to set the activations to 1 K , for each of the K branches. To train our networks, we use Stochastic Gradient Descent with a momentum of 0.9 and a variable learning rate defined by the annealing schedule of Ganin & Lempitsky (2015) as µp = µ0 (1+α p)β , where p is the training progress, relative to the total number of training epochs, µ0 is the initial learning rate, which we take to be 10 2, and α = 10 and β = 0.75 as in Ganin & Lempitsky (2015). We eliminate exploding gradients by ℓ2-norm clipping. Furthermore, we modulate the plasticity of the activations at every gate as π(i) = 1 p, that is, we make π(i) decay linearly as training progresses. As data preprocessing, we apply mean subtraction, as in Ganin & Lempitsky (2015). We train for 200 epochs, during which the network is exposed to all the image data from the source and target domains, but only to the annotations from the source domain(s). We feed our DAMNets images resized to 224 224 pixels, as expected by Res Net-50.