Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models
Authors: Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, Chun Fan
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that our approach can compromise a wide range of downstream NLP tasks in an effective and stealthy way. |
| Researcher Affiliation | Collaboration | Kangjie Chen1, Yuxian Meng2, Xiaofei Sun2, Shangwei Guo3, Tianwei Zhang1, Jiwei Li2,4, and Chun Fan5, 1Nanyang Technological University, 2Shannon.AI, 3Chongqing University, 4Zhejiang University, 5Computer Center of Peking University & Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1 (in Appendix) illustrates the details of embedding backdoors into a foundation model, as explained below. ... Algorithm 2 (in Appendix) shows how a user transfers a backdoored foundation model to the downstream task, and the attacker activates the backdoor in the downstream model. |
| Open Source Code | No | The paper does not provide concrete access to its own source code, stating only that it used 'open-sourced code' for comparison with RIPPLe. |
| Open Datasets | Yes | We selected a public corpora as the clean training data (i.e., English Wikipedia) (Devlin et al., 2018)... We select 8 tasks from the popular General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018)... we select SQuAD V2.0 (Rajpurkar et al., 2016)... we select CoNLL-2003 (Sang, 2002). |
| Dataset Splits | Yes | We also prepare a validation set containing the clean and malicious samples following the above approach. We keep fine-tuning the model until it achieves the lowest loss on this validation set for both benign and malicious data |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using BERT, Hugging Face, and Transformers baselines but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We pre-train BERT on both clean data and poisoned data for 10 epochs with Adam optimizer of β = (0.9, 0.98), a learning rate of 2e-5 and a batch size of 2048. |