Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TRAP: Targeted Redirecting of Agentic Preferences
Authors: Hangoo Kang, Jehyeok Yeon, Gagandeep Singh
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TRAP on the Microsoft Common Objects in Context (COCO) dataset, building multi-candidate decision scenarios. Across these scenarios, TRAP consistently induces decision-level preference redirection on leading models, including LLa VA-34B, Gemma3, GPT-4o, and Mistral-3.2, significantly outperforming existing baselines such as SPSA, Bandit, and standard diffusion approaches. These findings expose a critical, generalized vulnerability: autonomous agents can be consistently misled through visually subtle, semantically-guided cross-modal manipulations. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign EMAIL |
| Pseudocode | Yes | A.1 Algorithm Algorithm 1 TRAP Framework |
| Open Source Code | Yes | The code for TRAP is accessible on Git Hub at https://github.com/uiuc-focal-lab/TRAP. |
| Open Datasets | Yes | We evaluate TRAP on 100 image-caption pairs from the popular COCO Captions dataset [Chen et al., 2015], simulating a black-box n-way selection setting. |
| Dataset Splits | Yes | We evaluate our attack on 100 image-caption pairs from the popular COCO Captions dataset [Chen et al., 2015], simulating a black-box n-way selection setting. For each instance, a "bad image" is generated using a negative prompt created via Llama-3-8B [Grattafiori et al., 2024]. This image is verified to have an initial selection probability below the majority threshold when compared against n 1 competitors, ensuring a challenging starting point for optimization. |
| Hardware Specification | Yes | All experiments were run on a server with four NVIDIA A100-PCIE-40GB GPUs and a 48-core Intel Xeon Silver 4214R CPU. |
| Software Dependencies | Yes | All experiments are implemented in Py Torch. We use CLIP Vi T-B/32 [Radford et al., 2021] for embedding extraction, with adversarial image decoding performed by Stable Diffusion v2.1 (base) through the Img2Img interface. |
| Experiment Setup | Yes | Optimization is performed with Adam (learning rate 0.005, 20 steps per iteration). Grid search is conducted over diffusion strength [0.3, 0.8] and CFG [2.0, 12.0] with initial values of 0.5 and 7.5, respectively. |