Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism

Authors: Beitao Chen, Xinyu Lyu, shengming yuan, Jingkuan Song, Hengtao Shen, Lianli Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations across three MLLMs and five benchmarks demonstrate Safe PTR s state-of-the-art performance in mitigating jailbreak risks without compromising utility. Our code is available at https://github.com/BT-C/Safe PTR.
Researcher Affiliation	Academia	1 Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China 2Southwestern University of Finance and Economics, Chengdu, China 3 Engineering Research Center of Intelligent Finance, Ministry of Education 4Tongji University
Pseudocode	No	The paper describes the Safe PTR framework in Section 3 and illustrates its components through textual explanations and a block diagram (Fig. 5). However, it does not include a dedicated pseudocode or algorithm block with structured, step-by-step procedures.
Open Source Code	Yes	Our code is available at https://github.com/BT-C/Safe PTR.
Open Datasets	Yes	We use Jailbreak V-28K [Luo et al., 2024b] (text-driven), MM-Safety Bench [Liu et al., 2023b], and Fig Step [Gong et al., 2025] (image-driven)... Benign task accuracy is measured on MME [Fu et al., 2023] and MM-Vet [Yu et al., 2024]...
Dataset Splits	No	The paper mentions using datasets like Fig Step (500) and MM-Safety Bench (5040) and refers to a 'unified test set'. However, it does not explicitly provide specific details on how the datasets are split into training, validation, or test sets, such as percentages, absolute sample counts for each split, or explicit references to predefined splits.
Hardware Specification	Yes	All experiments are conducted on four RTX3090 GPUs.
Software Dependencies	No	Following Immune [Ghosal et al., 2024], we implement the proposed Safe PTR using Hugging Face Transformers library. The LLa VA1.5-7B results are based on version 1.2.2 from the official benchmark repository. The paper mentions 'Hugging Face Transformers library' but does not specify a version number for this library. 'LLaVA1.5-7B version 1.2.2' refers to a model version, not a software dependency version.
Experiment Setup	Yes	We set the number of tokens sampled k = 10%. For LLa VA-1.5-7B, Deep Seek-VL2, and Mini GPT-4-7B, harmful tokens are pruned in layers [7, 9), [4, 6), [7, 9).