Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention
Authors: Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou13098-13106
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results and analysis prove the effectiveness of the proposed approaches. First, the proposed DIIA boosts the baseline models by up to 21.08% in terms of accuracy; Second, under the unimodal scenarios, the MKD module allows our multimodal MC model to significantly outperform the unimodal models by up to 18.87%, which are trained and tested with only audio or textual data. |
| Researcher Affiliation | Collaboration | Zhiqi Huang1 , Fenglin Liu1 , Xian Wu2, Shen Ge2, Helin Wang1, Wei Fan2, Yuexian Zou1,3 1 ADSPLAB, School of ECE, Peking University, China 2 Tencent, China 3 Peng Cheng Laboratory, China {zhiqihuang, fenglinliu98, wanghl15, zouyx}@pku.edu.cn, {kevinxwu, shenge, davidwfan}@tencent.com |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations (e.g., MHA, FFN, attention calculations) and provides a figure illustrating the architecture, but it does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper states, 'we also collect two audio-oriented multimodal machine comprehension datasets, i.e., L-TOEFL and CET, from English listening tests' and provides URLs for TOEFL ETS and CET official websites. However, it does not provide specific access information (e.g., a direct download link, DOI, or repository) for the *compiled and used* L-TOEFL and CET datasets themselves. |
| Dataset Splits | Yes | We randomly divide the L-TOEFL and CET datasets into 1000/162/162 and 657/110/109 examples as for train/dev/test data partitioning, respectively, following ratios of 0.75/0.125/0.125. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only mentions using VGGish for feature extraction. |
| Software Dependencies | No | The paper mentions using VGGish, GloVe vectors, and the Adam optimizer, but it does not specify any version numbers for these software components or any other libraries/environments (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We adopt the Adam optimizer for optimizing the parameters, with a mini-batch size of 12 and initial learning rate of 0.001. After training 100 epochs, we select the model which works the best on the dev set, and then evaluate it on the test set in terms of accuracy (%). |