BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Authors: Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, Bo Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show the effectiveness of Bad Chain for two COT strategies across four LLMs (Llama2, GPT-3.5, Pa LM2, and GPT-4) and six complex benchmark tasks encompassing arithmetic, commonsense, and symbolic reasoning. We conduct extensive empirical evaluations for Bad Chain under different settings. |
| Researcher Affiliation | Academia | Zhen Xiang1, Fengqing Jiang2, Zidi Xiong1, Bhaskar Ramasubramanian3, Radha Poovendran2, Bo Li1 1University of Illinois Urbana-Champaign 2University of Washington 3Western Washington University |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code related to this work is available at https://github.com/Django-Jiang/Bad Chain. |
| Open Datasets | Yes | Datasets: Following prior works on COT like (Wei et al., 2022; Wang et al., 2023b), we consider six benchmark datasets encompassing three categories of challenging reasoning tasks. For arithmetic reasoning, we consider three datasets on math word problems, including GSM8K (Cobbe et al., 2021), MATH (Hendrycks et al., 2021), and ASDiv (Miao et al., 2020). For commonsense reasoning, we consider CSQA for multiple-choice questions (Talmor et al., 2019) and Strategy QA for true or false questions (Geva et al., 2021). For symbolic reasoning, we consider Letter, a dataset for last letter concatenation by Wei et al. (2022). More details about these datasets are shown in App. A.1. |
| Dataset Splits | Yes | For each model on each dataset, we poison a specific proportion of demonstrations, which is detailed in Tab. 4 in App. A.3. Again, these choices can be easily determined in practice using merely twenty clean instances, as demonstrated by our ablation studies in Sec. 4.4. |
| Hardware Specification | No | The paper mentions the use of LLMs like GPT-3.5, GPT-4, PaLM2, and Llama2, along with some inference settings (e.g., temperature, top p, float16 data type). However, it does not specify the underlying hardware (e.g., specific GPU or CPU models, memory details) used to run these models or conduct the experiments. |
| Software Dependencies | No | The paper mentions LLMs such as GPT-3.5, GPT-4, PaLM2, and Llama2, and refers to Open AI documentation. It specifies a "float16 data type" for Llama2 inference. However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used (e.g., Python 3.x). |
| Experiment Setup | Yes | We follow the decoding strategy as on the documentation from Open AI (2023b), including temperature to 1 and top p to 1. The decoding strategy is set to temperature = 0.7, top p = 0.95, top k = 40 by default [for PaLM2]. The decoding strategy is set to temperature = 1, top p = 0.7, top k = 50 [for Llama2]. |