Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Authors: Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, William Yang Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluation on the IPI benchmark Agent Dojo demonstrates that MELON outperforms SOTA defenses in both attack prevention and utility preservation. Moreover, we show that combining MELON with a SOTA prompt augmentation defense (denoted as MELON-Aug) further improves its performance. We also conduct a detailed ablation study to validate our key designs.
Researcher Affiliation	Academia	1University of California, Santa Barbara 2William & Mary. Correspondence to: Kaijie Zhu <EMAIL>.
Pseudocode	Yes	Algorithm 1 MELON Algorithm at Step t
Open Source Code	Yes	Code is available at https: //github.com/kaijiezhu11/MELON.
Open Datasets	Yes	We evaluate MELON on the IPI benchmark Agent Dojo (Debenedetti et al., 2024). Agent Dojo is an evaluation framework for assessing AI agents robustness against indirect prompt injection attacks.
Dataset Splits	No	Agent Dojo designs 16, 21, 20, 40 user tasks for their agents, respectively. Besides, each agent also has different attack tasks and injection points. It picks one user task and one attack task to form an attack case, and in total, 629 attack cases. The paper describes how attack cases are formed but does not specify training, validation, or test splits for any dataset, nor does it mention any splitting methodology with percentages, sample counts, or random seeds.
Hardware Specification	No	We consider three models as the LLM model in each agent: GPT-4o, o3-mini, and Llama-3.3-70B. The paper specifies the LLM models used but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies	No	Next, we employ Open AI s text-embedding-v3 model (Open AI, 2024) that maps these descriptions to dense vector representations. The paper mentions using a specific OpenAI embedding model but does not list any other software dependencies, libraries, or their version numbers that would be necessary to replicate the experiment environment.
Experiment Setup	Yes	We set the temperature for each model as 0 to avoid randomness. We set the primary similarity threshold θ = 0.8 to balance detection sensitivity and false positive rate, the ablation study on different similarity thresholds is presented in Section 4.3. The task-neutral prompt Tf is designed to be independent of specific domains or tasks.