Generalizable Multi-linear Attention Network

Authors: Tao Jin, Zhou Zhao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on several datasets of corresponding tasks, the experimental results show that MAN could achieve competitive results compared with baseline methods, showcasing the effectiveness of our contributions.
Researcher Affiliation Academia Tao Jin Zhejiang University jint_zju@zju.edu.cn Zhou Zhao Zhejiang University zhaozhou@zju.edu.cn
Pseudocode No The paper mentions 'The detailed proof and complete algorithm process are shown in the appendix.' but the pseudocode or algorithm itself is not provided in the main text.
Open Source Code No The paper does not provide any statement or link indicating that its source code is open or publicly available.
Open Datasets Yes We evaluate MAN on three challenging tasks, multimodal sentiment analysis, multimodal speaker traits recognition, and multimodal video retrieval. In this section, we provide a brief introduction of the datasets (CMU-MOSI [41] for multimodal sentiment analysis, POM [25] for multimodal speaker traits recognition, MSR-VTT [33] and LSMDC [28] for multimodal video retrieval).
Dataset Splits Yes There are 1284 segments in the training set, 229 in the validation set, and 686 in the test set. (CMU-MOSI) The training, validation, and test set distributions are approximately 600, 100, and 203, respectively. (POM)
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions several tools and models used for feature extraction and comparison (e.g., 'Glove embeddings [24]', 'Facet [11]', 'COVAREP [4]', 'P2FA [37]', 'S3D [32]', 'Dense Net-161 [10]', 'VGGish model', 'Google Cloud speech API', 'Res Net-50 [8]', 'SENet-154 [9]'), but does not provide specific version numbers for these software components or for the main frameworks/libraries used for implementation (e.g., PyTorch, TensorFlow).
Experiment Setup Yes The hyperparameters of MAN include Adam learning rate 0.001, the structure of integration network (1 layer of integration block, with hidden sizes of 40, number of heads 10, number of random features 24). Hidden size denotes the common size of d1, d2,...,dm. We divide all the time steps into 4 chunks and apply local sequential constraints. (for sentiment analysis and speaker traits recognition) The hyperparameters of MAN include Adam learning rate 5 10 5, which we decay by a multiplicative factor 0.95 every 1000 optimization steps, the structure of integration network (1 layer of integration block, with hidden size 512, number of heads 8, number of random features 512). Hidden size denotes the common size of d1, d2,...,dm. We divide the time steps into 10 chunks and employ local sequential constraints. (for video retrieval)