KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation
Authors: Haozhe Feng, Zhaoyang You, Minghao Chen, Tianye Zhang, Minfeng Zhu, Fei Wu, Chao Wu, Wei Chen
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments show that KD3A significantly outperforms state-of-the-art UMDA approaches. Moreover, the KD3A is robust to the negative transfer and brings a 100 reduction of communication cost compared with other decentralized UMDA methods. |
| Researcher Affiliation | Academia | 1State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China 3School of Public Affairs, Zhejiang University, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1 KD3A training process with epoch t. |
| Open Source Code | Yes | In addition, our KD3A is easy to implement and we create an open-source framework to conduct KD3A on different benchmarks. |
| Open Datasets | Yes | We perform experiments on four benchmark datasets: (1) Amazon Review (Ben-David et al., 2006), (2) Digit-5 (Zhao et al., 2020), (3) Office-Caltech10 (Gong et al., 2012), (4) Domain Net (Peng et al., 2019)... |
| Dataset Splits | No | The paper describes using source domains for training and target domains for evaluation in a domain adaptation setting, but does not provide specific train/validation/test dataset splits (e.g., 80/10/10) for reproducibility within individual domains. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch and SGD optimizer, but does not specify exact version numbers for software dependencies such as PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | For model optimization, We use the SGD with 0.9 momentum as the optimizer and take the cosine schedule to decay learning rate from high (i.e. 0.05 for Amazon Review and Digit5, and 0.005 for Office-Caltech10 and Domain Net) to zero. ... Confidence gate is the only hyper-parameter in KD3A, and should be treated carefully. ... Therefore, we gradually increase it from low (e.g., 0.8) to high (e.95) in training. |