reproducibilityindex.ai

Sparse Mixture-of-Experts are Domain Generalizable Learners

Authors: Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Domain Bed demonstrate that GMo E trained with ERM outperforms SOTA DG baselines by a large margin.
Researcher Affiliation	Collaboration	Bo Li1 Yifei Shen2 Jingkang Yang1 Yezhen Wang3 Jiawei Ren1 Tong Che3, 4 Jun Zhang2 Ziwei Liu1 B 1S-Lab, Nanyang Technological University 2The Hong Kong University of Science and Technology 3Mila-Quebec AI Institute 4Nvidia Research
Pseudocode	Yes	Algorithm 1: Conditional Statements Deﬁne intervals Ii R, i = 1, , M Deﬁne functions hi, , i = 1, , M + 1 switch h1(x) do if h1(x) Ii then apply hi+1 to x
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for their proposed methodology is publicly available.
Open Datasets	Yes	In this subsection, we evaluate GMo E on Domain Bed (Gulrajani & Lopez-Paz, 2021) with 8 benchmark datasets: PACS, VLCS, Ofﬁce Home, Terra Incognita, Domain Net, SVIRO, Wilds-Camelyon and Wilds-FMOW. Detailed information on datasets and evaluation protocols are provided in Appendix D.1.
Dataset Splits	Yes	For train-validation selection, we split each training domain into training and validation subsets. Then, we pool the validation subsets of each training domain to create an overall validation set. Finally, we choose the model maximizing the accuracy on the overall validation set, and report the ﬁnal accuracy on one leave-out test domain.
Hardware Specification	No	The paper mentions 'computational overhead' and 'flops' for models, and reports 'Step Time (s)' and 'Run-time Memory (GB)' in Table 16, but it does not specify the exact GPU models, CPU models, or other hardware components used for running the experiments.
Software Dependencies	No	The paper states, 'We optimize models using Adam optimizer (Kingma & Ba, 2015)...' but does not provide specific version numbers for Adam, Python, PyTorch, TensorFlow, CUDA, or other relevant software libraries.
Experiment Setup	Yes	We optimize models using Adam optimizer (Kingma & Ba, 2015) with slightly different parameters on different datasets (see Table 8). The training and inference batch size is set to 32 for each domain. Table 8: Hyperparameters to reproduce best performance of GMo E on each dataset. Learning Rate, Weight Decay.