Memory Fusion Network for Multi-view Sequential Learning

Authors: Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, Louis-Philippe Morency

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the multi-view approaches. Furthermore, MFN outperforms all current stateof-the-art models, setting new state-of-the-art results for all three multi-view datasets.
Researcher Affiliation Academia Amir Zadeh Carnegie Mellon University, USA abagherz@cs.cmu.edu Paul Pu Liang Carnegie Mellon University, USA pliang@cs.cmu.edu Navonil Mazumder Instituto Polit ecnico Nacional, Mexico navonil@sentic.net Soujanya Poria NTU, Singapore sporia@ntu.edu.sg Erik Cambria NTU, Singapore cambria@ntu.edu.sg Louis-Philippe Morency Carnegie Mellon University, USA morency@cs.cmu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It provides mathematical equations for the model components but no algorithmic steps.
Open Source Code Yes All the code and data required to recreate the reported results are available at https://github.com/A2Zadeh/MFN.
Open Datasets Yes We choose three multi-view domains: multimodal sentiment analysis, emotion recognition and speaker traits analysis. [...] We use four different datasets for English and Spanish sentiment analysis in our experiments. The CMU-MOSI dataset (Zadeh et al. 2016)... The MOUD dataset (Perez-Rosas, Mihalcea, and Morency 2013)... The You Tube dataset (Morency, Mihalcea, and Doshi 2011)... The ICT-MMMO dataset (W ollmer et al. 2013)... We perform experiments on IEMOCAP dataset (Busso et al. 2008)... The POM dataset (Park et al. 2014)...
Dataset Splits Yes The training, validation and testing splits are performed so that the splits are speaker independent. The full set of videos (and segments for datasets where the annotations are at the resolution of segments) in each split is detailed in Table 1.
Hardware Specification Yes On a Nvidia GTX 1080 Ti GPU, Zadeh2017 runs with an average frequency of 278 IPS (data point inferences per second) while our model runs at an ultra realtime frequency of 2858 IPS.
Software Dependencies No The paper mentions software like Glove word embeddings, P2FA, Facet, and COVAREP, but does not provide specific version numbers for these software components.
Experiment Setup No The paper describes data preprocessing steps (time steps based on word utterances, feature calculation, speaker independent splits) and states that hyperparameters were chosen based on a validation set, but it does not provide specific hyperparameter values or detailed training configurations (e.g., learning rate, batch size, number of epochs, optimizer specifics).