Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TRAP: Targeted Redirecting of Agentic Preferences

Authors: Hangoo Kang, Jehyeok Yeon, Gagandeep Singh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TRAP on the Microsoft Common Objects in Context (COCO) dataset, building multi-candidate decision scenarios. Across these scenarios, TRAP consistently induces decision-level preference redirection on leading models, including LLa VA-34B, Gemma3, GPT-4o, and Mistral-3.2, significantly outperforming existing baselines such as SPSA, Bandit, and standard diffusion approaches. These findings expose a critical, generalized vulnerability: autonomous agents can be consistently misled through visually subtle, semantically-guided cross-modal manipulations.
Researcher Affiliation	Academia	1University of Illinois Urbana-Champaign EMAIL
Pseudocode	Yes	A.1 Algorithm Algorithm 1 TRAP Framework
Open Source Code	Yes	The code for TRAP is accessible on Git Hub at https://github.com/uiuc-focal-lab/TRAP.
Open Datasets	Yes	We evaluate TRAP on 100 image-caption pairs from the popular COCO Captions dataset [Chen et al., 2015], simulating a black-box n-way selection setting.
Dataset Splits	Yes	We evaluate our attack on 100 image-caption pairs from the popular COCO Captions dataset [Chen et al., 2015], simulating a black-box n-way selection setting. For each instance, a "bad image" is generated using a negative prompt created via Llama-3-8B [Grattafiori et al., 2024]. This image is verified to have an initial selection probability below the majority threshold when compared against n 1 competitors, ensuring a challenging starting point for optimization.
Hardware Specification	Yes	All experiments were run on a server with four NVIDIA A100-PCIE-40GB GPUs and a 48-core Intel Xeon Silver 4214R CPU.
Software Dependencies	Yes	All experiments are implemented in Py Torch. We use CLIP Vi T-B/32 [Radford et al., 2021] for embedding extraction, with adversarial image decoding performed by Stable Diffusion v2.1 (base) through the Img2Img interface.
Experiment Setup	Yes	Optimization is performed with Adam (learning rate 0.005, 20 steps per iteration). Grid search is conducted over diffusion strength [0.3, 0.8] and CFG [2.0, 12.0] with initial values of 0.5 and 7.5, respectively.