AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation
Authors: Zihao TANG, Zheqi Lv, Shengyu Zhang, Yifan Zhou, Xinyu Duan, Fei Wu, Kun Kuang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach. |
| Researcher Affiliation | Collaboration | Zihao Tang, Zheqi Lv, Shengyu Zhang Zhejiang University {tangzihao,zheqilv,sy zhang}@zju.edu.cn Yifan Zhou Shanghai Jiao Tong University geniuszhouyifan@gmail.com Xinyu Duan Huawei Cloud duanxinyu@huawei.com Fei Wu & Kun Kuang Zhejiang University {wufei,kunkuang}@zju.edu.cn |
| Pseudocode | Yes | For space issues, we leave the pseudo-code of our overall method in Appendix A. In Appendix A: "The pseudo-code of our proposed method is displayed in Algorithm 1." |
| Open Source Code | Yes | Code available at https://github.com/Ishi Kura-a/Au G-KD |
| Open Datasets | Yes | The proposed method is evaluated on 3 datasets Office-31 (Saenko et al., 2010), Office-Home (Venkateswara et al., 2017), and Vis DA-2017 (Peng et al., 2017). |
| Dataset Splits | Yes | for evaluation purposes, the student domain Ds of these two datasets is divided into training, validation, and testing sets using a seed, with proportions set at 8:1:1 respectively. As to Vis DA-2017, we split the validation domain into 80% training and 20% validation and directly use the test domain for test. |
| Hardware Specification | Yes | Each experiment is conducted using a single NVIDIA Ge Force RTX 3090 and takes approximately 1 day to complete. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam' and shows PyTorch-like code structures, but does not provide specific version numbers for any key software components or libraries. |
| Experiment Setup | Yes | We summarize the hyperparameters and training schedules of Au G-KD on the three datasets in Table 5. Table 5 lists: Optimizer Adam, Learning Rate (except Encoder) 1e-3, Learning Rate (Encoder)GPI 1e-4, Batch size 2048, Nz 256, Image Resolution 32 32, seed {2021,2022, ,2025}, αg 20, αe 0.00025, αa 0.25, βa 0.1. Notably, the temperature of the KL-divergence in Module 3 is set to 10. |