MMT: Multi-way Multi-modal Transformer for Multimodal Learning
Authors: Jiajia Tang, Kang Li, Ming Hou, Xuanyu Jin, Wanzeng Kong, Yu Ding, Qibin Zhao
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments demonstrate that MMT can achieve state-of-the-art or comparable performance. and 4 Experiments Setups |
| Researcher Affiliation | Collaboration | Jiajia Tang1 , Kang Li1 , Ming Hou2 , Xuanyu Jin1 , Wanzeng Kong1 , Yu Ding3 and Qibin Zhao2 1Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, School of Computer Science and Technology, Hangzhou Dianzi University, China 2RIKEN Center for Advanced Intelligence Project (AIP), Japan 3Virtual Human Group, Netease Fuxi AI Lab |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured, code-like steps for its method. |
| Open Source Code | No | The paper does not provide any statement regarding the release of open-source code, nor does it include a link to a code repository. |
| Open Datasets | Yes | The public sentiment benchmark CMU-MOSI [Zadeh et al., 2016] is comprised of the aligned and preprocessed audio, video and text modality. and The POM dataset [Park et al., 2014] contains 903 movie opinion videos... |
| Dataset Splits | Yes | The 2199 clips are spilt into 1284 train samples, 229 validation samples, and 686 test samples. and The division of the train, validation and test sets is 600, 100 and 203, respectively. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | The grid-search is performed over the hyper-parameters to find the model with the best validation task loss. The range of key hyper-parameters are summarized as follows: layer [2, 7], tensor rank [2, 8], residual parameter α [0.1, 0.7]. |