Probabilistic Conformal Distillation for Enhancing Missing Modality Robustness

Authors: Mengxi Chen, Fei Zhang, Zihua Zhao, Jiangchao Yao, Ya Zhang, Yanfeng Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a range of benchmark datasets demonstrate the superiority of PCD over state-of-the-art methods. Extensive comparison on multimodal classification and segmentation tasks consistently validate the superior performance of our method compared to the state-of-the-art approaches.
Researcher Affiliation Academia 1 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 2 School of Artificial Intelligence, Shanghai Jiao Tong University 3 Shanghai Artificial Intelligence Laboratory {mxchen_mc, ferenas, sjtuszzh, Sunarker, ya_zhang, wangyanfeng}@sjtu.edu.cn
Pseudocode Yes The whole training procedure of PCD is shown in Algorithm 1.
Open Source Code Yes Code is available at: https://github.com/mxchen-mc/PCD.
Open Datasets Yes Datasets. We implement experiments on four multimodal datasets, comprising two classification datasets CASIA-SURF and Ce FA, and two segmentation datasets NYUv2 and Cityscapes. CASIA-SURF [49] and Ce FA [27] are two large face anti-spoofing datasets... NYUv2 [37] and Cityscapes [8] are both two-modality segmentation datasets...
Dataset Splits Yes This dataset comprises 29,000 samples for training, 1,000 for validation, and 57,000 for testing. Similarly, in Ce FA [27], we employ a cross-ethnicity and cross-attack protocol as recommended by the authors, which divides the dataset into training, validation, and testing sets with 35,000, 18,000, and 54,000 samples respectively. There are 5,000 annotated samples, where 2,975 samples are for training, 500 for validation, and 1,525 for testing.
Hardware Specification No The paper mentions using specific model backbones (e.g., 'Res Net-18', 'Res Net50') but does not specify any details about the hardware used for training or inference, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions the use of optimizers (e.g., 'SGD optimizer [34]', 'Adam optimizer [25]') and backbone architectures (e.g., 'Res Net-18 [17]'), but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes Experimental Details. For classification CASIA-SURF and Ce FA, the SGD optimizer [34] is used and the batch size is 64. The dimension of the Gaussian distribution is 512. We report the results using the metric of Average Classification Error Rate (ACER). Each modality leverages a separate Res Net-18 [17] as the unimodal encoder. We employ an exponential decay learning rate strategy in which the learning rate is fixed at 1e-3 during the warm-up stage and then decays exponentially. Weight decay and momentum are set to 0.0005 and 0.9, respectively. For segmentation experiments on NYUv2 and Cityscapes, we use the Adam optimizer [25] and set the batch size to 16. The results are evaluated by the metric of mean IOU (m IOU). The learning rate is initialized with 1e-2 and 1e-4 respectively for two datasets and adapted by the one-cycle scheduler. Following [46], we use ESANet [36] as the backbone. On all datasets, the variances are obtained through a two-layer MLP, where the hidden size is 1024.