Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
Authors: Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and Bra TS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other stateof-the-art methods consistently, demonstrating its effectiveness and versatility. |
| Researcher Affiliation | Collaboration | Zirun Guo1,2, Tao Jin1 , Jingyuan Chen1, Zhou Zhao1,2 1 Zhejiang University, 2 Shanghai AI Lab |
| Pseudocode | Yes | Algorithm 1 Classifier-guided gradient modulation |
| Open Source Code | Yes | Our code is available at https://github.com/zrguo/CGGM. |
| Open Datasets | Yes | We use four multimodal datasets: UPMC-Food 101 [23], CMU-MOSI [27], IEMOCAP [3], and Bra TS 2021 [1]. |
| Dataset Splits | No | The paper mentions using specific datasets and following previous work for some (e.g., CMU-MOSI and IEMOCAP "Following previous work [18, 10]"), but it does not explicitly provide the train/validation/test split percentages or sample counts for any of the datasets (UPMC-Food 101, CMU-MOSI, IEMOCAP, Bra TS 2021). |
| Hardware Specification | No | The paper does not specify the exact GPU models, CPU types, or any other detailed hardware specifications used for running the experiments. It only mentions "Additional gpu memory cost (MB)" in Appendix C, implying GPUs were used, but without models. |
| Software Dependencies | No | The paper mentions using pre-trained models (e.g., "bert-base-uncased model [5]", "Vi T [6]") and optimizers (e.g., "Adam optimizer", "Adam W optimizer", "SGD optimizer"), but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Table 7 presents the main hyperparameters of the four datasets. For Bra TS 2021, the start learning rate is set to 4e-4 with warm-up epochs to 1e-2 and the final learning rate is 1e-3. Besides, for the loss function, we use the combination of soft dice loss and cross-entropy loss, which can be represented as Ltask = LDice+λ1LCE. We set λ1 to 1. Particularly, we use a weighted cross-entropy loss function, where the weight is 0.2, 0.3, 0.25 and 0.25 for the background, label 1, label 2 and label 3, respectively. |