AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

Authors: Zihao TANG, Zheqi Lv, Shengyu Zhang, Yifan Zhou, Xinyu Duan, Fei Wu, Kun Kuang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach.
Researcher Affiliation Collaboration Zihao Tang, Zheqi Lv, Shengyu Zhang Zhejiang University {tangzihao,zheqilv,sy zhang}@zju.edu.cn Yifan Zhou Shanghai Jiao Tong University geniuszhouyifan@gmail.com Xinyu Duan Huawei Cloud duanxinyu@huawei.com Fei Wu & Kun Kuang Zhejiang University {wufei,kunkuang}@zju.edu.cn
Pseudocode Yes For space issues, we leave the pseudo-code of our overall method in Appendix A. In Appendix A: "The pseudo-code of our proposed method is displayed in Algorithm 1."
Open Source Code Yes Code available at https://github.com/Ishi Kura-a/Au G-KD
Open Datasets Yes The proposed method is evaluated on 3 datasets Office-31 (Saenko et al., 2010), Office-Home (Venkateswara et al., 2017), and Vis DA-2017 (Peng et al., 2017).
Dataset Splits Yes for evaluation purposes, the student domain Ds of these two datasets is divided into training, validation, and testing sets using a seed, with proportions set at 8:1:1 respectively. As to Vis DA-2017, we split the validation domain into 80% training and 20% validation and directly use the test domain for test.
Hardware Specification Yes Each experiment is conducted using a single NVIDIA Ge Force RTX 3090 and takes approximately 1 day to complete.
Software Dependencies No The paper mentions 'Optimizer Adam' and shows PyTorch-like code structures, but does not provide specific version numbers for any key software components or libraries.
Experiment Setup Yes We summarize the hyperparameters and training schedules of Au G-KD on the three datasets in Table 5. Table 5 lists: Optimizer Adam, Learning Rate (except Encoder) 1e-3, Learning Rate (Encoder)GPI 1e-4, Batch size 2048, Nz 256, Image Resolution 32 32, seed {2021,2022, ,2025}, αg 20, αe 0.00025, αa 0.25, βa 0.1. Notably, the temperature of the KL-divergence in Module 3 is set to 10.