Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Memory Injection Attacks on LLM Agents via Query-Only Interaction

Authors: Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, Zhen J. Xiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory.
Researcher Affiliation Academia Shen Dong1 , Shaochen Xu2 , Pengfei He1, Yige Li3, Jiliang Tang1, Tianming Liu2, Hui Liu1, Zhen Xiang2 1Michigan State University 2University of Georgia 3Singapore Management University
Pseudocode Yes The full procedure is detailed in Algorithm 1, Appendix A.
Open Source Code No Code can be found on github" and "We will release our code to support further exploration in this area." Also, in the NeurIPS Paper Checklist, section 5 states: "The datasets used (e.g., MIMIC-III, e ICU, MMLU) are public, while the code will be released later."
Open Datasets Yes We test MINJA on three types of existing agents based on different LLMs across diverse tasks, encompassing healthcare, web activities, and general QA. Below are their details: (1) RAP [20]... on the Webshop dataset [42]... (2) EHRAgent [29]... adopt two real-world EHR datasets... MIMIC-III, and e ICU... (3) We build a QA Agent... using the MMLU dataset [15].
Dataset Splits Yes For each victim-target pair on MMLU, we randomly select 10 queries containing the victim term as the attack queries. For the other three datasets, we randomly select 15 attack queries for each victim-target pair... we reserve 50 additional benign queries for EHRAgent and RAP, and 30 benign queries for QA Agent irrelevant to the victim term for regular users... After all attack queries are submitted ..., we evaluate the agent on a separate set of victim queries (10 for MMLU, 30 for other datasets) that include the victim term... evaluate the agent on a set of benign queries (10 for MMLU, 30 for other datasets) that do not contain the victim term, respectively.
Hardware Specification Yes We conducted all experiments using API access to Open AI s GPT-4 and GPT-4o models, as described in Section 5.1.
Software Dependencies No In this work, we mainly use the cosine similarity computed on text embeddings of all-Mini LM-L6-v2 for EHRAgent and RAP, and text-embedding-ada-002 for QA Agent. The performance of MINJA with other embedding models is shown in Section 5.3... DPR[21], REALM[14], ANCE[40], BGE[25], textembedding-ada-002(ada-002), and all-Mini LM-L6-v2(Mini LM).
Experiment Setup Yes For RAP, EHRAgent, and QA Agent, 3/4/5 memory records with the highest input similarities are retrieved from the memory bank as demonstrations, respectively. In this work, we mainly use the cosine similarity computed on text embeddings of all-Mini LM-L6-v2 for EHRAgent and RAP, and text-embedding-ada-002 for QA Agent... For Patient ID, Medication, Items, and Terms, we shorten the indication prompt 4, 5, 5, and 5 times respectively.