reproducibilityindex.ai

Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks

Authors: Jun-Tae Lee, Hyunsin Park, Sungrack Yun, Simyung Chang7354-7362

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the effectiveness of MHM on various multi-modal tasks. We use the state-of-the-art methods as baselines, and show notable performance gain for all the baselines. We conduct extensive experiment to analyze the efﬁcacy of MHM in terms of generalization capability. For three multi-modal tasks (audio-visual event detection, action localization, sentiment analysis), we successfully boost the performance of the state-of-the-art methods on benchmark datsets (AVE, THUMOS14, CMU-MOSEI).
Researcher Affiliation	Industry	Jun-Tae Lee1, Hyunsin Park1, Sungrack Yun1, and Simyung Chang2 Qualcomm AI Research1* Qualcomm Korea YH2 {juntlee,hyunsinp,sungrack,simychan}@qti.qualcomm.com
Pseudocode	No	The paper describes the MHM algorithm in detail within the text (Section 4) but does not provide it in a structured pseudocode block or a formally labeled algorithm.
Open Source Code	No	The paper does not provide any statement regarding the release of source code or a link to a code repository.
Open Datasets	Yes	We perform experiments on AVE dataset (Tian et al. 2018). We use THUMOS14 (Jiang et al. 2014) dataset. We evaluate our method for the multi-modal sentiment analysis on CMU-MOSEI (Zadeh et al. 2018) dataset.
Dataset Splits	Yes	For each class, we randomly select 90% of data points for training and use the remaining for testing. (toy dataset); AVE dataset... It consists of 3,339 training and 804 testing videos; THUMOS14 (Jiang et al. 2014) dataset containing 200 training and 212 testing videos.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers with their versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	For audio-visual event detection, we use four head modules. For RGB-flow action localization, the number of head modules is set to 2. For multi-modal sentiment analysis, K is empirically set as 3.