reproducibilityindex.ai

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Authors: Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate AGENTPOISON s effectiveness in attacking three types of real-world LLM agents: RAG-based autonomous driving agent, knowledge-intensive QA agent, and healthcare EHRAgent.
Researcher Affiliation	Academia	Zhaorun Chen1 , Zhen Xiang2, Chaowei Xiao3, Dawn Song4, Bo Li12 1University of Chicago, 2University of Illinois, Urbana-Champaign 3University of Wisconsin, Madison 4University of California, Berkeley
Pseudocode	Yes	Algorithm 1 AGENTPOISON Trigger Optimization... Algorithm 2 Trigger Initialization
Open Source Code	Yes	The code and data is available at https://github.com/Bill Chan226/Agent Poison.
Open Datasets	Yes	For agent-driver we use its corresponding dataset published in their paper, which contain 23k experiences in the memory unit4. 4https://github.com/USC-GVL/Agent-Driver ... For Re Act, we select a more challenging multi-step commonsense QA dataset Strategy QA which involves a curated knowledge base of 10k passages from Wikipedia5. 5https://allenai.org/data/strategyqa
Dataset Splits	Yes	Train/Test split For Agent-Driver, we have randomly sampled 250 samples from its validation set (apart from the 23k samples in the training set); for Re Act agent, we have used the full test set in Strategy QA8 which consists of 229 samples; and for EHRAgent, we have randomly selected 100 samples from its validation set in our experiment. Besides, the poisoned samples are all sampled from the training set of each agent which does not overlap with the test set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100) or CPU types (e.g., Intel Core i7). It only references LLM backbones like GPT3.5 and LLa MA3.
Software Dependencies	No	The paper mentions software components like 'gpt-2' for LLMb, 'Chat GPT' and 'LLa MA3' as LLM backbones, and 'DPR [14] checkpoints' and 'REALM [11] checkpoints' for retrievers. However, specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	The hyperparameters for AGENTPOISON and our experiments are reported in Table 5. Table 5: Hyperparameter Settings for AGENTPOISON. Parameters: Ltar Threshold ηtar 0.8, Number of replacement token m 500, Number of sub-sampled token s 100, Gradient accumulation steps 30, Iterations per gradient optimization 1000, Batch size 64, Surrogate LLM gpt-2, Beam size 1.