Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

Authors: Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. 5 Experiments Table 1: Comparison with the state-of-the-arts on the WILDS image testbeds and out-of-distribution setting. Metric means and standard deviations are reported across replicates.
Researcher Affiliation Collaboration Tao Zhong 1, Zhixiang Chi 2, Li Gu 2, Yang Wang2,3, Yuanhao Yu2, Jin Tang2 1University of Toronto, 2Huawei Noah s Ark Lab, 3Concordia University
Pseudocode Yes Algorithm 1 Training for Meta-DMo E
Open Source Code Yes Our code is available at https://github.com/n3il666/Meta-DMo E.
Open Datasets Yes WILDS [39], Domain Net [58] and PACS [44], i Wild Cam [10], Camelyon17 [7],Rx Rx1 [69] and FMo W [18] and Poverty Map [83], Image Net [21]
Dataset Splits Yes Specifically, We first split the data samples in each source domain DS i into disjoint support and query sets. The unlabeled support set (x SU) is used to perform adaptation via knowledge distillation, while the labeled query set (x Q, y Q) is used to evaluate the adapted parameters to explicitly test the generalization on unseen data.
Hardware Specification No The paper states that hardware specifications are included in the supplemental material, but no specific hardware details (e.g., GPU models, CPU types) are provided in the main text.
Software Dependencies No The paper mentions using 'Adam optimizer' but does not specify any software versions for libraries, frameworks, or programming languages.
Experiment Setup Yes After that, the model is further trained using Alg. 1 for 15 epochs with a fixed learning rate of 3e 4 for α and 3e 5 for β. During meta-testing, we use Line 13 of Alg. 1 to adapt before making a prediction for every testing domain. Specifically, we set the number of examples for adaptation at test time = {24, 64, 75, 64, 64} for i Wild Cam, Camelyon17, Rx Rx1, FMo W and Poverty Map, respectively. For both meta-training and testing, we perform one gradient update for adaptation on the unseen target domain.