AnyDA: Anytime Domain Adaptation

Authors: Omprakash Chakraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on 4 benchmark datasets and demonstrate that Any DA achieves superior performance over the state-of-the-art methods, more significantly at lower computation budgets. We also include comprehensive ablation studies to depict the importance of each module of our proposed framework.
Researcher Affiliation Collaboration 1IIT Kharagpur, 2MIT-IBM Watson AI Lab {opckgp@,abir@cse.}iitkgp.ac.in, {aadarsh,rpanda}@ibm.com
Pseudocode Yes Algorithm 1 The training pseudocode for Any DA is shown in Algorithm 1.
Open Source Code Yes Project page: https://cvir.github.io/projects/anyda
Open Datasets Yes The dataset is publicly available to download at: https://people.eecs.berkeley.edu/ jhoffman/domainadapt/#datasets_code. (Office-31)
Dataset Splits Yes In addition, following the general practice we use a validation set to obtain best hyperparameters.
Hardware Specification Yes All the experiments were performed using 4 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using Res Net-50 and Mobile Net V3 as network architectures, and DANN as a domain adaptation method, but does not specify software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup Yes For Eqn. 1, we use τstu = 0.1 and τtea = 0.04. We use a momentum hyperparameter value of λ = 0.96. In Eqn. 3, a threshold value of τpl = 0.9 was used for Office31 and Office-Home dataset, while τpl = 0.4 for the Domain Net dataset. While in Eqn. 4, we use λcls = 1, 15, 64, λrd = 1, 1, 0.5 for Office-31, Office-Home and Domain Net, respectively, λpl = 0.1 for all the datasets. We perform warm-up using source data for 100, 100, 30 epochs for Office-31, Office-Home and Domain Net, respectively. The proposed approach Any DA is trained for 30, 100, 20 epochs, respectively. We use a per-gpu batch size of 64 (32 source + 32 target) for all the experiments. We use a learning rate of 2e-4 for Office-31 and Office-Home, while 3e-5 for Domain Net. We follow cosine annealing to update the learning rate.