MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion

Authors: Md Mofijul Islam, Tariq Iqbal1043-1051

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated Mu Mu by comparing its performance to state-of-the-art multimodal HAR approaches on three activity datasets. Our extensive experimental results suggest that Mu Mu outperforms all the evaluated approaches across all three datasets.
Researcher Affiliation Academia Md Mofijul Islam, Tariq Iqbal School of Engineering and Applied Science, University of Virginia {mi8uu,tiqbal}@virginia.edu
Pseudocode No The paper describes the system architecture and components textually but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for its methodology.
Open Datasets Yes We evaluated the performance of our proposed approach, Mu Mu, by applying it on three multimodal activity datasets: UCSD-MIT (Kubota et al. 2019), UTD-MHD (Chen, Jafari, and Kehtarnavaz 2015) and MMAct (Kong et al. 2019).
Dataset Splits Yes For MMAct dataset, we followed originally proposed crosssubject and cross-session evaluation settings and reported F1-scores (Tables 1 & 2). For UTD-MHAD and UCSD-MIT datasets, we followed leaveone-subject-out cross-validation and reported top-1 accuracies (Tables 4 & 3).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It only states 'For more implementation and training procedure details, please check the supplementary materials.' without specifics in the main text.
Software Dependencies No The paper mentions models like 'Res Net-50' and 'Co-occurrence approach' but does not specify software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes We segmented the data from visual modalities (RGB and depth) with a window size of 1 and a stride of 3. For the data from other sensor modalities, we used a window size of 5 and a stride of 5. The unimodal feature of each modality is encoded to 128 sized feature embedding. We used two fully connected layers with Re-LU activation after the first layer for activity-group classification in auxiliary task learning. We used similar task learning architecture for the activity classification in target task learning.