Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Authors: Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments considering 6 widely-used LLMs using forbidden questions from Jailbreak Bench [27]. Without using guardrail, JAIL-CON achieves an average attack success rate (ASR) of 0.95, significantly higher than other existing methods. When the guardrail is applied, JAIL-CON exhibits a significantly lower filtering rate compared to direct answer generation methods and is second only to encoding-based ones (e.g. Base64). Considering only harmful answers that can bypass the guardrail s filtering, JAIL-CON achieves an ASR of 0.64, significantly better than the second-place attack of 0.27. |
| Researcher Affiliation | Academia | Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang CISPA Helmholtz Center for Information Security Saarbrucken, Germany 66123 EMAIL |
| Pseudocode | Yes | As shown in Figure 2, JAIL-CON is an iteration that consists mainly of three steps, where step 2 offers two variations. Roughly speaking, in each iteration, JAIL-CON performs the following steps. Step 1: Task Combination. For a given harmful task tharm,i from the harmful set Tharm, JAIL-CON first selects an auxiliary task taux,j from the auxiliary set Taux and combines (parallelizes) them into a concurrent task tcon,i,j through a combination unit C for later usage. Step 2: Concurrent Execution. In step 2, JAIL-CON can perform both variants (CVT and CIT) or just one. In CVT, JAIL-CON queries the target LLM θ using the CVT context and the concurrent task tcon,i,j, forcing LLM to generate concurrent answers a CV T,i,j to both harmful and auxiliary tasks. In CIT, different from CVT, by using the CIT context, JAIL-CON causes the target LLM to output blank placeholder information in a skip-word manner, which is considered an idle task while answering the harmful task. In CIT, the concurrent answer a CIT,i,j is generated by θ. Step 3: Shadow Judge. In the last step, an answer extractor E and a shadow judge model J are used to extract the harmful answer from the concurrent answer (a CV T,i,j or a CIT,i,j) and judge the success of the attack. |
| Open Source Code | Yes | 1Our Code is available at https://github.com/Trust AIRLab/JAIL-CON. |
| Open Datasets | Yes | To assess the ability of LLMs to solve concurrent tasks, we first construct concurrent datasets for GSM8K [25] and Truthful QA [26]. ... We conduct extensive experiments considering 6 widely-used LLMs using forbidden questions from Jailbreak Bench [27]. |
| Dataset Splits | No | To assess the ability of LLMs to solve concurrent tasks, we first construct concurrent datasets for GSM8K [25] and Truthful QA [26]. Following the demonstrations on the right side of Figure 1b, we begin by sampling two sequential questions from the evaluation dataset to conduct the concurrency evaluation. The k-th sample in our evaluation datasets are formed by combining the k-th and ((k + 1) mod k)-th sample from GSM8k or Truthful QA. |
| Hardware Specification | Yes | Our experiments are conducted on NVIDIA A100-80GB GPUs. |
| Software Dependencies | No | In this work, we use the following APIs or platforms to query models or load model checkpoints. GPT-4o: Query gpt-4o-2024-08-06 via https://api.openai.com/v1. ... LLa MA2-7B: Load meta-llama/Llama-2-7b-chat from Hugging Face.3 |
| Experiment Setup | Yes | Implementation Details. In JAIL-CON, we set the maximum number of iterations M to 50. ... For PAIR, we set the number of streams and the maximum depth to 30 and 3, and deploy Vicuna-13B and GPT-4o mini as the attack and judge model, respectively. For GPTFuzzer, the maximum number of iterations and energy are set to 100 and 1, and GPT-4o mini is used to perform mutations. For Flip Attack, we use its well-performed flip char in sentence mode. For JAM, we optimize its cipher characters for 100 iterations on each harmful task. For TAP, the branching factor, width, and depth are set to 4, 4, and 10, respectively. ... To ensure reproducibility, we set the temperature of all LLMs to 0. |