BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning

Authors: Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on various classic RL tasks illustrate that Bad RL can substantially degrade the performance of a victim agent with minimal poisoning efforts (0.003% of total training steps) during training and infrequent attacks during testing. Code is available at: https://github.com/7777777cc/code.
Researcher Affiliation Collaboration Jing Cui1, Yufei Han2, Yuzhe Ma3, Jianbin Jiao1, Junge Zhang4,1* 1University of Chinese Academy of Sciences 2INRIA 3Microsoft Azure AI 4Institute of Automation, Chinese Academy of Sciences
Pseudocode Yes Algorithm 1: Bad RL Algorithm
Open Source Code Yes Code is available at: https://github.com/7777777cc/code.
Open Datasets Yes Empirical results on various classic RL tasks illustrate that Bad RL can substantially degrade the performance of a victim agent... Empirical evaluations on four classic RL tasks reveal that Bad RL-based backdoor attacks... Pong, Breakout, Qbert, Space Invaders
Dataset Splits No The paper describes 'poisoning proportion: 0.003%, 0.003%, 0.002%, 0.002% for Pong, Breakout, Qbert, Space Invaders' as part of the training effort but does not specify dataset splits (e.g., train/validation/test percentages or counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or library versions (e.g., Python version, PyTorch version, etc.) needed to replicate the experiment.
Experiment Setup Yes Poisoning proportion: 0.003%, 0.003%, 0.002%, 0.002% for Pong, Breakout, Qbert, Space Invaders. Models are tested every 10000 steps.