Coupled Mamba: Enhanced Multimodal Fusion with Coupled State Space Model

Authors: Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, Wei Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CMU-MOSEI, CH-SIMS, CH-SIMSV2, BRCA, MM-IMDB through multi-domain input verify the effectiveness of our model compared to current state-of-the-art methods, improved F1-Score by 0.4%, 0.9%, and 2.3% on the CMU-MOSEI, CH-SIMS and CH-SIMSV2 datasetes respectively, 49% faster inference and 83.7% GPU memory save.
Researcher Affiliation Academia Wenbing Li Hang Zhou Junqing Yu Zikai Song Wei Yang Huazhong University of Science and Technology {wenbingli, henrryzh, yjqing, skyesong, weiyangcs}@hust.edu.cn
Pseudocode Yes Algorithm 1: Coupled Mamba
Open Source Code Yes Code is available at https://github.com/hustcselwb/coupledmamba.
Open Datasets Yes We conduct experiments on five benchmark datasets (CMU-MOSEI, CH-SIMS [24], CHSIMSV2 [25], MM-IMDB and BRCA).
Dataset Splits Yes CMU-MOSEI dataset is an extension of CMU-MOSI, contains 22856 samples of movie review video clips. In this dataset, 16326 samples are used as the training set, and the remaining 1871 and 4659 samples are used as the validation set and test set respectively.
Hardware Specification Yes All experiments were conducted on a Linux workstation equipped with a single NVIDIA 32GB V100GPU and a 32-core Intel Xeon CPU.
Software Dependencies Yes The environment we use is python 3.10, cuda12.1, torch 2.12.
Experiment Setup Yes We use a hidden dimension size of 128, an expansion coefficient of 2, a convolution kernel size of 4, = dstate/8 as the configuration of each Mamba block, and a layer number of 3 to train our Coupled Mamba. We use Adam to optimize the model and set the learning rate to 0.0005 , weight decay coefficient is 0.0005, epoch is 150, the batch size is set to 1024, 128, 256 on CMU-MOSEI, CH-SIMS, and CH-SIMSV2. L1 loss is used as the loss function for the regression task, and cross entropy is used as the loss function for the classification task.