Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model
Authors: Mohan Zhang, Yihua Zhang, Jinghan Jia, Zhangyang "Atlas" Wang, Sijia Liu, Tianlong Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on four LRMs (including Phi-RM, Nemotron-Nano, R1-Qwen, and R1-Llama) across three benchmarks (including GSM8K, MATH500, and MMLU-Pro), demonstrating that the Deadlock Attack is highly effective and stealthy, achieving high attack success rates with minimal impact on benign input performance. |
| Researcher Affiliation | Academia | 1University of North Carolina at Chapel Hill 2Michigan State University 3University of Texas at Austin |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Our code will be published after acceptance |
| Open Datasets | Yes | We use three mathematical reasoning benchmarks: GSM8K [68], the math category of MMLU-Pro [69], and MATH500 [70]. To further test the attack s robustness and rule out potential false positives on inherently difficult problems, we also evaluate on the highly challenging AIME 2024 benchmark [71]. |
| Dataset Splits | Yes | For training the adversarial embedding under (4), we curated a dataset by selecting the first 30 samples from the MATH500 dataset at level 5. For each sample, we generated 100 distinct reasoning answers using the R1-Qwen model. Twenty of these samples (with their corresponding answers) formed the training set, while the remaining 10 served as a validation set to monitor the attack loss during optimization. [...] To align with the attack evaluation, we use the same test sets (50 samples each for GSM8K and MMLU-Pro, and the 43-sample Level 1 subset for MATH500). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It only vaguely mentions that 'Our experiments are highly resource-intensive and time-consuming'. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions). |
| Experiment Setup | Yes | The adversarial embedding was trained using the Adam [74] optimizer with a learning rate of 10 3 for 1000 steps to ensure convergence. During training, one (problem, answer) pair was randomly sampled from the training set at each step. Unless otherwise specified, all experiments employ an adversarial embedding of length L = 1, and the backdoor trigger is instantiated as the single token !!!!! . |