Sparse Mixture-of-Experts are Domain Generalizable Learners
Authors: Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Domain Bed demonstrate that GMo E trained with ERM outperforms SOTA DG baselines by a large margin. |
| Researcher Affiliation | Collaboration | Bo Li1 Yifei Shen2 Jingkang Yang1 Yezhen Wang3 Jiawei Ren1 Tong Che3, 4 Jun Zhang2 Ziwei Liu1 B 1S-Lab, Nanyang Technological University 2The Hong Kong University of Science and Technology 3Mila-Quebec AI Institute 4Nvidia Research |
| Pseudocode | Yes | Algorithm 1: Conditional Statements Define intervals Ii R, i = 1, , M Define functions hi, , i = 1, , M + 1 switch h1(x) do if h1(x) Ii then apply hi+1 to x |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for their proposed methodology is publicly available. |
| Open Datasets | Yes | In this subsection, we evaluate GMo E on Domain Bed (Gulrajani & Lopez-Paz, 2021) with 8 benchmark datasets: PACS, VLCS, Office Home, Terra Incognita, Domain Net, SVIRO, Wilds-Camelyon and Wilds-FMOW. Detailed information on datasets and evaluation protocols are provided in Appendix D.1. |
| Dataset Splits | Yes | For train-validation selection, we split each training domain into training and validation subsets. Then, we pool the validation subsets of each training domain to create an overall validation set. Finally, we choose the model maximizing the accuracy on the overall validation set, and report the final accuracy on one leave-out test domain. |
| Hardware Specification | No | The paper mentions 'computational overhead' and 'flops' for models, and reports 'Step Time (s)' and 'Run-time Memory (GB)' in Table 16, but it does not specify the exact GPU models, CPU models, or other hardware components used for running the experiments. |
| Software Dependencies | No | The paper states, 'We optimize models using Adam optimizer (Kingma & Ba, 2015)...' but does not provide specific version numbers for Adam, Python, PyTorch, TensorFlow, CUDA, or other relevant software libraries. |
| Experiment Setup | Yes | We optimize models using Adam optimizer (Kingma & Ba, 2015) with slightly different parameters on different datasets (see Table 8). The training and inference batch size is set to 32 for each domain. Table 8: Hyperparameters to reproduce best performance of GMo E on each dataset. Learning Rate, Weight Decay. |