Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models

Authors: Hongqiong Zhong, Qingyang Teng, Baolin Zheng, Guanlin Chen, Yingshui Tan, Zhendong Liu, Jiaheng Liu, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments, Vo TA achieves remarkable effectiveness, improving the average attack success rate (ASR) by 26.71% (from 63.70% to 90.41%) on 9 open-source and 6 commercial VLMs, compared to the state-of-the-art methods.
Researcher Affiliation Collaboration Hongqiong Zhong1 Qingyang Teng1 Baolin Zheng1 Guanlin Chen1 Yingshui Tan1 Zhendong Liu1 Jiaheng Liu2 Wenbo Su1 Xiaoyong Zhu1 Bo Zheng1 Kaifu Zhang1 1Alibaba Group 2Nanjing University EMAIL
Pseudocode No The paper describes its methodology in Section 3 and Figure 2, outlining processes like 'Risk Scenario Generation' and 'Multimodal Thought Construction' through textual descriptions and flowcharts, but does not present any formal pseudocode or algorithm blocks.
Open Source Code Yes Our code and dataset are available at https://github.com/Hongqiong12/Vo TA.
Open Datasets Yes Our code and dataset are available at https://github.com/Hongqiong12/Vo TA.
Dataset Splits No The paper describes generating 100 distinct scenarios for each of the 19 subcategories, resulting in a total of 1900 scenarios, which are then used as input for the attack. However, it does not specify explicit training, validation, or test dataset splits for these scenarios in the context of reproducing the VLM evaluation experiments.
Hardware Specification Yes All experiments except the commercial models were conducted on 8 NVIDIA H20 96GB GPUs equipped with Intel(R) Xeon(R) Platinum 8469C CPUs.
Software Dependencies No The T2I model in our attack is Stable-Diffusion-3.5-Large [55]. The paper also mentions using Gemini-1.5-Pro and GPT-4o as attack LLMs. However, it does not provide specific version numbers for other key software components or programming languages used in the implementation.
Experiment Setup Yes For risk scenario generation, we employ a dual-model attack LLM, comprising both Gemini-1.5-Pro and GPT-4o, to synthesize more diverse scenarios. Each model is prompted to generate 100 scenarios per subcategory. The combined outputs are then merged, and human experts perform deduplication to curate a final, diverse set of unique scenarios... for the subsequent risk scenario decomposition stage, we use Gemini-1.5-Pro exclusively as the attack LLM. The T2I model in our attack is Stable-Diffusion-3.5-Large [55].