BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
Authors: Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, Dawn Song
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prototype and evaluate BACKDOORL in four competitive environments. The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated. |
| Researcher Affiliation | Academia | Lun Wang1 , Zaynah Javed1 , Xian Wu2 , Wenbo Guo2 , Xinyu Xing2 and Dawn Song1 1University of California, Berkeley 2Pennsylvania State University |
| Pseudocode | No | The paper describes the methodology in prose and uses diagrams (Figure 1, Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The videos are hosted at https://github.com/wanglun1996/multi agent rl backdoor videos. |
| Open Datasets | Yes | We evaluate BACKDOORL in four different environments [Bansal et al., 2017]. |
| Dataset Splits | No | The paper mentions training and simulation, but does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, or testing stages in the context of static datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models, processor types, or memory amounts) used for running the experiments. |
| Software Dependencies | No | We implement BACKDOORL in about 1700 lines of Python code. For adversarial training, we leverage an implementation of Proximal Policy Optimization (PPO) from Stable Baselines [Hill et al., 2018]. |
| Experiment Setup | Yes | To accelerate the failure, we introduce a constant penalty reward c (c < 0) for each time-step. The adversarial training typically needs 40 to 60 epochs to converge. The agent stably learns the backdoor functionality after around 150 epochs. |