reproducibilityindex.ai

Quantifying and Enhancing Multi-modal Robustness with Modality Preference

Authors: Zequn Yang, Yake Wei, Ce Liang, Di Hu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method demonstrates substantial improvements in performance and robustness compared with existing methods. Furthermore, our training procedure can be easily extended to enhance other robust training strategies, highlighting its credibility and flexibility.
Researcher Affiliation	Academia	Zequn Yang1, Yake Wei1, Ce Liang1, Di Hu1 Gaoling School of Artificial Intelligence, Renmin University of China1 {zqyang,yakewei,liangce158,dihu}@ruc.edu.cn
Pseudocode	No	Subsequently, we propose a two-step training procedure called Certifiable Robust Multi-modal Training (CRMT), which can credibly obtain a robust multi-modal model. The training procedure of CRMT is as follows: Step 1: optimize with cross-entropy loss and margin regularization with term ρ: mina(m), W (m),ϕ(m) ρL1 + 1/N PN i=1 CE(h(xi), yi), where CE is the cross-entropy loss function. Step 2: fix W (m), ϕ(m), update a(m) to approach higher certified robustness: mina(m) L2 = 1/N PN i=1 r(xi), where r(x) is the lower bound in Equation 10.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets	Yes	We evaluate our method on different datasets including Kinetics-Sounds (Audio + Vision) (Arandjelovic & Zisserman, 2017), UCF101 (Optical flow + RGB) (Soomro et al., 2012), and VGGSound (Audio + Vision) (Chen et al., 2020).
Dataset Splits	No	We use the backbone Res Net18 (He et al., 2016) as the encoder for each uni-modality. Details about these datasets are presented in Appendix 8.1.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Res Net18 (He et al., 2016) as the encoder for each uni-modality' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	No	The paper discusses the training procedure (two steps, optimization objectives) but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) for the experiments.