FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Authors: Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, Suchi Saria
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce Fuse Mo E , a mixture-of-experts framework...The practical utility of Fuse Mo E in the real world is validated by a diverse set of challenging prediction tasks. In this section, we provide a theoretical guarantee of the benefits of the Laplace gating over the standard Softmax gating in Mo E. 4 Experiments |
| Researcher Affiliation | Academia | Department of Computer Science Johns Hopkins University, Department of Statistics and Data Sciences The University of Texas at Austin, Department of Biomedical Engineering Johns Hopkins University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | we have submitted the implementation of the proposed methods and all baselines in the supplementary material. |
| Open Datasets | Yes | We tested Fuse Mo E on a diverse set of benchmarks, including MIMIC-III [40] and MIMIC-IV [36], CMU-MOSI and MOSEI [97], the Physical Activity Monitoring (PAM) dataset [75], and CIFAR-10 [46]. |
| Dataset Splits | Yes | We allocated 70 percent of the data for model training, with the remaining 30 percent evenly split between validation and testing. The CMU-MOSI dataset contains 1284/229/686 train/validation/test samples, and the CMU-MOSEI dataset contains 16326/1871/4659 train/validation/test samples. |
| Hardware Specification | Yes | We train models using a Lambda Workstation with four A550 GPUs with 24 GB of memory. |
| Software Dependencies | No | Our methodology employs pre-trained T5 [73] for text encoding, librosa [56] for audio feature extraction, and Efficient Net [84] for video feature encoding. For radiological notes, we obtained 768-dimensional embeddings using the Bio Clinical BERT model [2]. |
| Experiment Setup | Yes | Table 8: Hyperparameters used for Mo E framework and general architecture. |