Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BAM-ICL: Causal Hijacking In-Context Learning with Budgeted Adversarial Manipulation
Authors: Rui Chu, Bingyin Zhao, Hanling Jiang, Shuchin Aeron, Yingjie Lao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BAM-ICL on diverse LLMs and datasets, the experimental results demonstrate that it achieves superior attack success rates and stealthiness and the adversarial ICEs are highly transferable to other models. |
| Researcher Affiliation | Collaboration | Rui Chu1 Bingyin Zhao2 Hanling Jiang1 Shuchin Aeron1 Yingjie Lao1 1 Department of Electrical and Computer Engineering, Tufts University 2 Meitu Inc EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Offline Phase: Budget Profile Construction; Algorithm 2 Online Phase: Budgeted Hijacking Attack; Algorithm 3 Word_Proj() Word Projection Back from Embedding Space |
| Open Source Code | Yes | Code is available at https://github.com/CRcr0/BAM-ICL. |
| Open Datasets | Yes | We follow the same practice in existing attacks Jeong [2023] against LLMs and evaluate BAM-ICL on SST-2 Socher et al. [2013], AG s News Zhang et al. [2015] and OLID Rosenthal et al. [2021]. |
| Dataset Splits | No | For each run of the offline phase, we select input output pairs equal in number to the attack context length from the training set. The budget profile is averaged over multiple runs. During the online phase, the full test set is used for evaluation. While the paper mentions using 'training set' and 'test set', it does not provide specific dataset split percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | Yes | All experiments are performed on NVIDIA L40S GPUs. |
| Software Dependencies | No | The paper does not explicitly provide specific version numbers for key software components or libraries used in its implementation. It mentions using 'Optuna' as a hyperparameter optimization framework, but not specific versions of programming languages or deep learning frameworks. |
| Experiment Setup | Yes | For each ICE, we allow up to three tokens to be modified. The context length n ranges from 2 to 12, consistent with prior works Qiang et al. [2023], Kandpal et al. [2023], Li et al. [2024]. We adopt the prompt construction and guiding sentence strategy from Qiang et al. [2023], combined with the sequential masking logic introduced by Garg et al. [2022]. To automate hyperparameter selection for the perturbation generation, we treat both the step size α and times t as variables in an optimization problem. Optuna s Tree-structured Parzen Estimator (TPE) Akiba et al. [2019] sampler iteratively proposes candidate pairs and receives feedback via an objective that reflects adversarial strength. |