Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations on the MUSt ARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.
Researcher Affiliation Academia 1University of Southern California, Los Angeles, CA 90089, USA 2Hong Kong University of Science and Technology, Hong Kong, China
Pseudocode Yes The pseudo-code of the ITHP algorithm is provided in Appendix D.
Open Source Code Yes Our codebase can be found in [https://github.com/joshuaxiao98/ITHP].
Open Datasets Yes In this section, we evaluate our proposed Information-Theoretic Hierarchical Perception (ITHP) model on three popular multimodal datasets: the Multimodal Sarcasm Detection Dataset (MUSt ARD; Castro et al., 2019), the Multimodal Opinion-level Sentiment Intensity dataset (MOSI; Zadeh et al., 2016), and the Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI; Zadeh et al., 2018d).
Dataset Splits Yes The evaluation is performed using a 5-fold cross-validation approach to ensure robustness and reliability.
Hardware Specification Yes All experiments were conducted on Nvidia A100 40GB GPUs.
Software Dependencies No The paper mentions using assets from BERT and DeBERTa and provides links to their general GitHub repositories, but it does not specify the version numbers for specific software libraries (e.g., Python, PyTorch, TensorFlow) used for their own implementation.
Experiment Setup Yes For the task of sarcasm detection, unless otherwise specified, we set the hyperparameters as follows: β = 32, γ = 8, λ = 1. We perform a 5-fold cross-validation, and for each experiment, we train the ITHP model for 200 epochs using an Adam optimizer with a learning rate of 10^-3. For the task of sentiment analysis, unless otherwise specified, we set the hyperparameters as follows: β = 8, γ = 32, λ = 1. We run each experiment for 40 epochs using an Adam optimizer with a learning rate of 10^-5.