BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning

Authors: Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, Dawn Song

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prototype and evaluate BACKDOORL in four competitive environments. The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
Researcher Affiliation Academia Lun Wang1 , Zaynah Javed1 , Xian Wu2 , Wenbo Guo2 , Xinyu Xing2 and Dawn Song1 1University of California, Berkeley 2Pennsylvania State University
Pseudocode No The paper describes the methodology in prose and uses diagrams (Figure 1, Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The videos are hosted at https://github.com/wanglun1996/multi agent rl backdoor videos.
Open Datasets Yes We evaluate BACKDOORL in four different environments [Bansal et al., 2017].
Dataset Splits No The paper mentions training and simulation, but does not specify explicit dataset splits (e.g., percentages or sample counts) for training, validation, or testing stages in the context of static datasets.
Hardware Specification No The paper does not provide specific hardware details (like CPU/GPU models, processor types, or memory amounts) used for running the experiments.
Software Dependencies No We implement BACKDOORL in about 1700 lines of Python code. For adversarial training, we leverage an implementation of Proximal Policy Optimization (PPO) from Stable Baselines [Hill et al., 2018].
Experiment Setup Yes To accelerate the failure, we introduce a constant penalty reward c (c < 0) for each time-step. The adversarial training typically needs 40 to 60 epochs to converge. The agent stably learns the backdoor functionality after around 150 epochs.