Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Authors: Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To better understand this and analyze the underlying reasons, we conduct experiments using six state-of-the-art VLLMs (see Table 2) from several perspectives. |
| Researcher Affiliation | Academia | Yangyang Guo National University of Singapore EMAIL Fangkai Jiao Nanyang Technological University I2R, A*STAR Liqiang Nie Harbin Institute of Technology (Shenzhen) Mohan Kankanhalli National University of Singapore |
| Pseudocode | No | The paper describes methodologies in prose and through experimental results, but it does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The NeurIPS Paper Checklist for 'Open access to data and code' states: 'NA Justification: We use existing code and data.' This indicates the paper does not release new code for the described methodology. |
| Open Datasets | Yes | We primarily conduct experiments on four available mainstream jailbreak datasets, as detailed in Table 1: VLSafe [25], Fig Step [27], MM-Safety Bench [26], VLGuard [18]. The images can be benign ones sourced from MSCOCO [31]. |
| Dataset Splits | No | The paper evaluates existing models and defense mechanisms on established jailbreak datasets (VLSafe, Fig Step, MM-Safety Bench, VLGuard), and describes using certain categories or rephrased questions from them. However, it does not specify explicit train/test/validation splits that the authors applied for their own experimental setup or for any models they trained (e.g., the LLM-Pipeline detector). |
| Hardware Specification | No | The NeurIPS Paper Checklist for 'Experiments compute resources' states 'Yes' and 'As explained in the main manuscript and supplementary materials.' However, upon review, neither the main manuscript nor the supplementary materials provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions various VLLMs and LLMs used (e.g., LLa VA-1.5, Intern VL2, QWen2-VL, Llama3.1, Mistral) and refers to GPT-4, but it does not provide a comprehensive list of specific software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions) required to reproduce the experimental environment. |
| Experiment Setup | No | The NeurIPS Paper Checklist for 'Experimental setting/details' states: 'NA Justification: We don't have hyperparameters.' The paper evaluates existing VLLMs and defense strategies and proposes an LLM-Pipeline detector, but does not provide specific hyperparameter values or detailed training configurations for any models it trains or fine-tunes. |