Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Understanding and Enhancing the Transferability of Jailbreaking Attacks

Authors: Runqi Lin, Bo Han, Fengwang Li, Tongliang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Pi F provides an effective and efficient red-teaming evaluation for proprietary LLMs.
Researcher Affiliation	Academia	Runqi Lin Sydney AI Centre, The University of Sydney EMAIL Bo Han Hong Kong Baptist University EMAIL Fengwang Li The University of Sydney EMAIL Tongliang Liu Sydney AI Centre, The University of Sydney EMAIL
Pseudocode	Yes	A ALGORITHM The three-stage Pi F algorithm is summarised in Algorithm 1. Algorithm 1: Perceived-importance Flatten Method
Open Source Code	Yes	Our implementation can be found at https://github.com/tmllab/2025_ICLR_Pi F.
Open Datasets	Yes	We evaluate our approach on two benchmark datasets: Adv Bench (Zou et al., 2023) and Malicious Instruct (Huang et al., 2023a), which contain 520 and 100 malicious inputs, respectively. Table 7. Links to datasets. Dataset Link Adv Bench https://github.com/llm-attacks/llm-attacks/tree/main/data/advbench Malicious Instruct https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data
Dataset Splits	No	The paper mentions using Adv Bench and Malicious Instruct datasets, stating they "contain 520 and 100 malicious inputs, respectively." However, it does not provide specific details on how these datasets are split into training, validation, or test sets for the authors' own experiments or method reproduction.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models (e.g., NVIDIA A100) or CPU models. It mentions 'National Computational Infrastructure (NCI Australia)' in the acknowledgments but no specific hardware details.
Software Dependencies	No	The paper mentions using models like Bert-Large, Llama-2, and GPT-2-Large, and refers to concepts like 'perplexity filter' which likely use libraries such as Hugging Face Transformers. However, it does not provide specific version numbers for these software components or any programming languages like Python.
Experiment Setup	Yes	Setup for Pi F. We employ Bert-Large (Devlin et al., 2019) as the source model with the evaluation template This intent is [MASK] . The hyperparameters are configured as follows: the number of iterations T is set to 50; the temperature τ is set to 0.25; the threshold Θ is set to 0.85; and the values of N, M, and K are all set to 15.