Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Backdoor Attacks in Token Selection of Attention Mechanism
Authors: Yunjuan Wang, Raman Arora
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings using synthetic datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Johns Hopkins University, Baltimore, USA. |
| Pseudocode | No | The paper describes mathematical proofs and theoretical analysis but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information for source code, such as a repository link, an explicit code release statement, or mention of code in supplementary materials. |
| Open Datasets | Yes | We empirically validate our theoretical findings using synthetic datasets. We adopt the dirty-label backdoor attack setup defined in Section 3. Standard and poisoned signal vectors are constructed from orthogonal basis directions: µ1 = µ e1, µ2 = µ e2, µ1 = α µ e3, µ2 = α µ e4. We designate the first |R| tokens as relevant and the last |P| tokens as poisoned for the poisoned data. We generate n = 20 training samples, along with 1000 standard test samples and 1000 poisoned test samples. Noise vectors are drawn from a standard multivariate Gaussian with covariance Σ = Id, yielding Tr(Σ) = d. |
| Dataset Splits | Yes | We generate n = 20 training samples, along with 1000 standard test samples and 1000 poisoned test samples. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as programming languages, libraries, or solvers, which are needed to replicate the experiment. |
| Experiment Setup | Yes | The token length is set to T = 8, dimension d = 4000, with |R| = 1 and |P| = 1. A single-head self-attention transformer is trained using gradient descent with step size η = 0.001 for τ0 = 10K iterations. Additional results are provided in Appendix B. [...] We run τ0 = 1K iterations with step size η = 0.01. |