Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Authors: Hao Li, Xiaogeng Liu, CHIU Chun, Dianqi Li, Ning Zhang, Chaowei Xiao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the effectiveness of DRIFT on the Agent Dojo and ASB benchmark, demonstrating its strong security performance while maintaining high utility across diverse models showcasing both its robustness and adaptability. |
| Researcher Affiliation | Academia | Hao Li1, Xiaogeng Liu2, Hung-Chun Chiu3, Dianqi Li3, Ning Zhang1, Chaowei Xiao2 1Washington University in St. Louis, 2Johns Hopkins University 3Independent Researcher EMAIL, EMAIL |
| Pseudocode | No | The paper describes the system architecture and components (Secure Planner, Dynamic Validator, Injection Isolator) and their workflow, but it does not present structured pseudocode or algorithm blocks. Figures 8-13 provide prompts used by different components, which are not considered pseudocode for the overall method. |
| Open Source Code | Yes | The code is released at https://github.com/Sa Fo Lab-WISC/DRIFT. |
| Open Datasets | Yes | We empirically validate the effectiveness of DRIFT on the Agent Dojo [24] and ASB [33] benchmark, demonstrating its strong security performance while maintaining high utility across diverse models showcasing both its robustness and adaptability. |
| Dataset Splits | No | The paper mentions collecting '1,000 such samples' for the Secure Planner and '1,000 training samples' for the Injection Isolator. It also states fine-tuning Qwen2.5-7B-Instruct on its 'policy dataset'. However, it does not explicitly provide information on training, validation, and test splits for these collected datasets, nor does it specify how splits were handled for the Agent Dojo and ASB benchmarks. |
| Hardware Specification | No | The paper mentions using various LLM models (GPT-4o, GPT-4o-mini, Claude-3-haiku, Claude-3.5-sonnet, Qwen2.5-7B-Instruct) and fine-tuning Qwen2.5-7B-Instruct with specific hyperparameters (batch size 4, 3 epochs, Adam optimizer, learning rate 2e-5). However, it does not specify the hardware used for these experiments, such as GPU or CPU models, memory, or cloud resources. |
| Software Dependencies | No | The paper mentions employing the Adam optimizer [39] and fine-tuning with Lo RA [32], but it does not provide specific version numbers for key software components such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation. |
| Experiment Setup | Yes | For Qwen2.5-7B-Instruct, we fine-tune it on our policy dataset (described in Section 2.5) using a batch size of 4 and training for three epochs. We employ the Adam optimizer [39] with weight decay and set the initial learning rate to 2e-5. |