Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models

Authors: Xiaowen Cai, Daizong Liu, Xiaoye Qu, Xiang Fang, Jianfeng Dong, Keke Tang, Pan Zhou, Lichao Sun, Wei Hu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Implementation Details 4.2 Main Results 4.3 Attack Efficiency and Robustness 4.4 Effectiveness of MI Estimation Networks 4.5 Visualization 4.6 Ablation Study
Researcher Affiliation	Academia	1Huazhong University of Science and Technology 2Wuhan University 3Nanyang Technological University 4Zhejiang Gongshang University 5Guangzhou University 6Lehigh University 7Peking University
Pseudocode	Yes	The algorithm of our attack is detailed in Appendix G. Algorithm 1 Our Proposed Transfer-Attack based on Informative Constraints of Adversarial/Benign MI
Open Source Code	No	Answer: [No] Justification: We will release the codes upon acceptance.
Open Datasets	Yes	We evaluate the adversarial robustness of three multi-modal datasets for the image captioning, image classification, and VQA tasks. The datasets consist of both images and prompts. The images are collected from DALL-E [87], SVIT [88] and VQAv2 [89]. The prompts for three tasks derive from the Cro PA [8].
Dataset Splits	No	The paper mentions specific datasets (DALL-E, SVIT, VQAv2) but does not provide explicit details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit references to standard splits).
Hardware Specification	Yes	All experiments are conducted on the NVIDIA RTX 4090 GPUs with 24GB of memory.
Software Dependencies	No	The paper mentions using the Adam optimizer and Sentence Transformer [90] but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup	Yes	We train both networks using the Adam optimizer for 100 epochs, with an initial learning rate of 0.01 that decays by a factor of 0.5 every 20 epochs. The number of channels of the encoded image feature map is 2048. To generate transferable examples, the perturbation budget ̑ is also set to 16/255. The epoch number is set to 1000. The momentum parameter µ is set to 0.9 and the step size is set as α = 16/epoch. Besides, the weights w1 = w2 = 1.