AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Authors: Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate AGENTPOISON s effectiveness in attacking three types of real-world LLM agents: RAG-based autonomous driving agent, knowledge-intensive QA agent, and healthcare EHRAgent.
Researcher Affiliation Academia Zhaorun Chen1 , Zhen Xiang2, Chaowei Xiao3, Dawn Song4, Bo Li12 1University of Chicago, 2University of Illinois, Urbana-Champaign 3University of Wisconsin, Madison 4University of California, Berkeley
Pseudocode Yes Algorithm 1 AGENTPOISON Trigger Optimization... Algorithm 2 Trigger Initialization
Open Source Code Yes The code and data is available at https://github.com/Bill Chan226/Agent Poison.
Open Datasets Yes For agent-driver we use its corresponding dataset published in their paper, which contain 23k experiences in the memory unit4. 4https://github.com/USC-GVL/Agent-Driver ... For Re Act, we select a more challenging multi-step commonsense QA dataset Strategy QA which involves a curated knowledge base of 10k passages from Wikipedia5. 5https://allenai.org/data/strategyqa
Dataset Splits Yes Train/Test split For Agent-Driver, we have randomly sampled 250 samples from its validation set (apart from the 23k samples in the training set); for Re Act agent, we have used the full test set in Strategy QA8 which consists of 229 samples; and for EHRAgent, we have randomly selected 100 samples from its validation set in our experiment. Besides, the poisoned samples are all sampled from the training set of each agent which does not overlap with the test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100) or CPU types (e.g., Intel Core i7). It only references LLM backbones like GPT3.5 and LLa MA3.
Software Dependencies No The paper mentions software components like 'gpt-2' for LLMb, 'Chat GPT' and 'LLa MA3' as LLM backbones, and 'DPR [14] checkpoints' and 'REALM [11] checkpoints' for retrievers. However, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes The hyperparameters for AGENTPOISON and our experiments are reported in Table 5. Table 5: Hyperparameter Settings for AGENTPOISON. Parameters: Ltar Threshold ηtar 0.8, Number of replacement token m 500, Number of sub-sampled token s 100, Gradient accumulation steps 30, Iterations per gradient optimization 1000, Batch size 64, Surrogate LLM gpt-2, Beam size 1.