Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
Authors: Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. 5 Experiments Table 1: Comparison with the state-of-the-arts on the WILDS image testbeds and out-of-distribution setting. Metric means and standard deviations are reported across replicates. |
| Researcher Affiliation | Collaboration | Tao Zhong 1, Zhixiang Chi 2, Li Gu 2, Yang Wang2,3, Yuanhao Yu2, Jin Tang2 1University of Toronto, 2Huawei Noah s Ark Lab, 3Concordia University |
| Pseudocode | Yes | Algorithm 1 Training for Meta-DMo E |
| Open Source Code | Yes | Our code is available at https://github.com/n3il666/Meta-DMo E. |
| Open Datasets | Yes | WILDS [39], Domain Net [58] and PACS [44], i Wild Cam [10], Camelyon17 [7],Rx Rx1 [69] and FMo W [18] and Poverty Map [83], Image Net [21] |
| Dataset Splits | Yes | Specifically, We first split the data samples in each source domain DS i into disjoint support and query sets. The unlabeled support set (x SU) is used to perform adaptation via knowledge distillation, while the labeled query set (x Q, y Q) is used to evaluate the adapted parameters to explicitly test the generalization on unseen data. |
| Hardware Specification | No | The paper states that hardware specifications are included in the supplemental material, but no specific hardware details (e.g., GPU models, CPU types) are provided in the main text. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' but does not specify any software versions for libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | After that, the model is further trained using Alg. 1 for 15 epochs with a fixed learning rate of 3e 4 for α and 3e 5 for β. During meta-testing, we use Line 13 of Alg. 1 to adapt before making a prediction for every testing domain. Specifically, we set the number of examples for adaptation at test time = {24, 64, 75, 64, 64} for i Wild Cam, Camelyon17, Rx Rx1, FMo W and Poverty Map, respectively. For both meta-training and testing, we perform one gradient update for adaptation on the unseen target domain. |