Calibrating Multimodal Learning

Authors: Huan Ma, Qingyang Zhang, Changqing Zhang, Bingzhe Wu, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, through extensive empirical studies, we identify current multimodal classification methods suffer from unreliable predictive confidence that tend to rely on partial modalities when estimating confidence.
Researcher Affiliation Collaboration 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2AI Lab, Tencent, Shenzhen, China 3Tianjin Key Lab of Machine Learning, Tianjin, China 4Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 5Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore.
Pseudocode Yes Algorithm 1 Calibrating Multimodal Classifier
Open Source Code No No explicit statement or link providing access to the source code for the methodology described in this paper was found. The reference to 'Mind Spore Open Fund' is an acknowledgment, not a code release.
Open Datasets Yes Datasets:We evaluate the proposed method on diverse datasets, including data with multimodal data, such as Yale B (Georghiades et al., 2002), Handwritten (Perkins & Theiler, 2003), CUB (Wah et al., 2011), Animal (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015) (which is a dataset under class-imbalanced), TUANDROMD (Borah et al., 2020), NYUD2 (Qi et al., 2017), and SUNRGBD (Song et al., 2015).
Dataset Splits No While the paper mentions using a 'validation set' for hyperparameter tuning in Section H.5, it does not provide specific percentages or sample counts for overall training/validation/test splits for all experiments, often deferring to 'previous work' for data division.
Hardware Specification Yes Table 7: Training time (Platform: RTX 3090 8).
Software Dependencies No The paper mentions 'CUDA Version: 11.2' but does not list multiple key software components with their specific version numbers (e.g., PyTorch, TensorFlow, or specific library versions) to ensure full reproducibility.
Experiment Setup Yes Parameter lambda for cub/animal/handwritten/yale B/tuandromd is set as 5/45/45/10/5. The dimensionalities of input, hidden layers are 128 and 300. We use Adam optimizer to train all CPM-Nets models with the learning rate of 10^-2 and no additional regularization term. ... Parameter lambda for cub/animal/hand-written/yale B/tuandromd are set as 15/25/10/35/75 for best performance. The dimensionalities of the latent space are 64. We use Adam optimizer to train the encoder and decoder with a learning rate of 10^-2. Then we train the encoder, decoder and classifier altogether for another with a learning rate of 10^-3.