Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Authors: Zequn Yang, Yake Wei, Ce Liang, Di Hu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method demonstrates substantial improvements in performance and robustness compared with existing methods. Furthermore, our training procedure can be easily extended to enhance other robust training strategies, highlighting its credibility and flexibility. |
| Researcher Affiliation | Academia | Zequn Yang1, Yake Wei1, Ce Liang1, Di Hu1 Gaoling School of Artificial Intelligence, Renmin University of China1 {zqyang,yakewei,liangce158,dihu}@ruc.edu.cn |
| Pseudocode | No | Subsequently, we propose a two-step training procedure called Certifiable Robust Multi-modal Training (CRMT), which can credibly obtain a robust multi-modal model. The training procedure of CRMT is as follows: Step 1: optimize with cross-entropy loss and margin regularization with term ρ: mina(m), W (m),ϕ(m) ρL1 + 1/N PN i=1 CE(h(xi), yi), where CE is the cross-entropy loss function. Step 2: fix W (m), ϕ(m), update a(m) to approach higher certified robustness: mina(m) L2 = 1/N PN i=1 r(xi), where r(x) is the lower bound in Equation 10. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on different datasets including Kinetics-Sounds (Audio + Vision) (Arandjelovic & Zisserman, 2017), UCF101 (Optical flow + RGB) (Soomro et al., 2012), and VGGSound (Audio + Vision) (Chen et al., 2020). |
| Dataset Splits | No | We use the backbone Res Net18 (He et al., 2016) as the encoder for each uni-modality. Details about these datasets are presented in Appendix 8.1. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Res Net18 (He et al., 2016) as the encoder for each uni-modality' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | No | The paper discusses the training procedure (two steps, optimization objectives) but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) for the experiments. |