reproducibilityindex.ai

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

Authors: Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. 5 Experiments Table 1: Comparison with the state-of-the-arts on the WILDS image testbeds and out-of-distribution setting. Metric means and standard deviations are reported across replicates.
Researcher Affiliation	Collaboration	Tao Zhong 1, Zhixiang Chi 2, Li Gu 2, Yang Wang2,3, Yuanhao Yu2, Jin Tang2 1University of Toronto, 2Huawei Noah s Ark Lab, 3Concordia University
Pseudocode	Yes	Algorithm 1 Training for Meta-DMo E
Open Source Code	Yes	Our code is available at https://github.com/n3il666/Meta-DMo E.
Open Datasets	Yes	WILDS [39], Domain Net [58] and PACS [44], i Wild Cam [10], Camelyon17 [7],Rx Rx1 [69] and FMo W [18] and Poverty Map [83], Image Net [21]
Dataset Splits	Yes	Specifically, We first split the data samples in each source domain DS i into disjoint support and query sets. The unlabeled support set (x SU) is used to perform adaptation via knowledge distillation, while the labeled query set (x Q, y Q) is used to evaluate the adapted parameters to explicitly test the generalization on unseen data.
Hardware Specification	No	The paper states that hardware specifications are included in the supplemental material, but no specific hardware details (e.g., GPU models, CPU types) are provided in the main text.
Software Dependencies	No	The paper mentions using 'Adam optimizer' but does not specify any software versions for libraries, frameworks, or programming languages.
Experiment Setup	Yes	After that, the model is further trained using Alg. 1 for 15 epochs with a fixed learning rate of 3e 4 for α and 3e 5 for β. During meta-testing, we use Line 13 of Alg. 1 to adapt before making a prediction for every testing domain. Specifically, we set the number of examples for adaptation at test time = {24, 64, 75, 64, 64} for i Wild Cam, Camelyon17, Rx Rx1, FMo W and Poverty Map, respectively. For both meta-training and testing, we perform one gradient update for adaptation on the unseen target domain.