FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

Authors: Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, Suchi Saria

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce Fuse Mo E , a mixture-of-experts framework...The practical utility of Fuse Mo E in the real world is validated by a diverse set of challenging prediction tasks. In this section, we provide a theoretical guarantee of the benefits of the Laplace gating over the standard Softmax gating in Mo E. 4 Experiments
Researcher Affiliation Academia Department of Computer Science Johns Hopkins University, Department of Statistics and Data Sciences The University of Texas at Austin, Department of Biomedical Engineering Johns Hopkins University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes we have submitted the implementation of the proposed methods and all baselines in the supplementary material.
Open Datasets Yes We tested Fuse Mo E on a diverse set of benchmarks, including MIMIC-III [40] and MIMIC-IV [36], CMU-MOSI and MOSEI [97], the Physical Activity Monitoring (PAM) dataset [75], and CIFAR-10 [46].
Dataset Splits Yes We allocated 70 percent of the data for model training, with the remaining 30 percent evenly split between validation and testing. The CMU-MOSI dataset contains 1284/229/686 train/validation/test samples, and the CMU-MOSEI dataset contains 16326/1871/4659 train/validation/test samples.
Hardware Specification Yes We train models using a Lambda Workstation with four A550 GPUs with 24 GB of memory.
Software Dependencies No Our methodology employs pre-trained T5 [73] for text encoding, librosa [56] for audio feature extraction, and Efficient Net [84] for video feature encoding. For radiological notes, we obtained 768-dimensional embeddings using the Bio Clinical BERT model [2].
Experiment Setup Yes Table 8: Hyperparameters used for Mo E framework and general architecture.