Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Authors: Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and Bra TS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other stateof-the-art methods consistently, demonstrating its effectiveness and versatility.
Researcher Affiliation Collaboration Zirun Guo1,2, Tao Jin1 , Jingyuan Chen1, Zhou Zhao1,2 1 Zhejiang University, 2 Shanghai AI Lab
Pseudocode Yes Algorithm 1 Classifier-guided gradient modulation
Open Source Code Yes Our code is available at https://github.com/zrguo/CGGM.
Open Datasets Yes We use four multimodal datasets: UPMC-Food 101 [23], CMU-MOSI [27], IEMOCAP [3], and Bra TS 2021 [1].
Dataset Splits No The paper mentions using specific datasets and following previous work for some (e.g., CMU-MOSI and IEMOCAP "Following previous work [18, 10]"), but it does not explicitly provide the train/validation/test split percentages or sample counts for any of the datasets (UPMC-Food 101, CMU-MOSI, IEMOCAP, Bra TS 2021).
Hardware Specification No The paper does not specify the exact GPU models, CPU types, or any other detailed hardware specifications used for running the experiments. It only mentions "Additional gpu memory cost (MB)" in Appendix C, implying GPUs were used, but without models.
Software Dependencies No The paper mentions using pre-trained models (e.g., "bert-base-uncased model [5]", "Vi T [6]") and optimizers (e.g., "Adam optimizer", "Adam W optimizer", "SGD optimizer"), but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup Yes Table 7 presents the main hyperparameters of the four datasets. For Bra TS 2021, the start learning rate is set to 4e-4 with warm-up epochs to 1e-2 and the final learning rate is 1e-3. Besides, for the loss function, we use the combination of soft dice loss and cross-entropy loss, which can be represented as Ltask = LDice+λ1LCE. We set λ1 to 1. Particularly, we use a weighted cross-entropy loss function, where the weight is 0.2, 0.3, 0.25 and 0.25 for the background, label 1, label 2 and label 3, respectively.