Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models
Authors: Hongqiong Zhong, Qingyang Teng, Baolin Zheng, Guanlin Chen, Yingshui Tan, Zhendong Liu, Jiaheng Liu, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments, Vo TA achieves remarkable effectiveness, improving the average attack success rate (ASR) by 26.71% (from 63.70% to 90.41%) on 9 open-source and 6 commercial VLMs, compared to the state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Hongqiong Zhong1 Qingyang Teng1 Baolin Zheng1 Guanlin Chen1 Yingshui Tan1 Zhendong Liu1 Jiaheng Liu2 Wenbo Su1 Xiaoyong Zhu1 Bo Zheng1 Kaifu Zhang1 1Alibaba Group 2Nanjing University EMAIL |
| Pseudocode | No | The paper describes its methodology in Section 3 and Figure 2, outlining processes like 'Risk Scenario Generation' and 'Multimodal Thought Construction' through textual descriptions and flowcharts, but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and dataset are available at https://github.com/Hongqiong12/Vo TA. |
| Open Datasets | Yes | Our code and dataset are available at https://github.com/Hongqiong12/Vo TA. |
| Dataset Splits | No | The paper describes generating 100 distinct scenarios for each of the 19 subcategories, resulting in a total of 1900 scenarios, which are then used as input for the attack. However, it does not specify explicit training, validation, or test dataset splits for these scenarios in the context of reproducing the VLM evaluation experiments. |
| Hardware Specification | Yes | All experiments except the commercial models were conducted on 8 NVIDIA H20 96GB GPUs equipped with Intel(R) Xeon(R) Platinum 8469C CPUs. |
| Software Dependencies | No | The T2I model in our attack is Stable-Diffusion-3.5-Large [55]. The paper also mentions using Gemini-1.5-Pro and GPT-4o as attack LLMs. However, it does not provide specific version numbers for other key software components or programming languages used in the implementation. |
| Experiment Setup | Yes | For risk scenario generation, we employ a dual-model attack LLM, comprising both Gemini-1.5-Pro and GPT-4o, to synthesize more diverse scenarios. Each model is prompted to generate 100 scenarios per subcategory. The combined outputs are then merged, and human experts perform deduplication to curate a final, diverse set of unique scenarios... for the subsequent risk scenario decomposition stage, we use Gemini-1.5-Pro exclusively as the attack LLM. The T2I model in our attack is Stable-Diffusion-3.5-Large [55]. |