SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Authors: Ethan Rathbun, Christopher Amato, Alina Oprea

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.
Researcher Affiliation Academia Ethan Rathbun , Christopher Amato , Alina Oprea Khoury College of Computer Sciences, Northeastern University
Pseudocode Yes Algorithm 1 The Sleeper Nets Attack
Open Source Code Yes Code is attached to the paper submission and provided anonymously here.
Open Datasets Yes We evaluate each method on a suite of 6 diverse environments against agents trained using the cleanrl [10] implementation of PPO [31]. First, to replicate and validate the results of [4] and [14] we test all attacks on Atari Breakout and Qbert from the Atari gymnasium suite [1]. In our evaluation we found that these environments are highly susceptible to backdoor poisoning attacks, thus we extend and focus our study towards the following 4 environments: Car Racing from the Box2D gymnasium [1], Safety Car from Safety Gymnasium [12], Highway Merge from Highway Env [18], and Trading BTC from Gym Trading Env [27].
Dataset Splits No The paper does not explicitly mention using a 'validation' dataset split for hyperparameter tuning or model selection in the context of data partitioning, though PPO is used which handles this differently.
Hardware Specification Yes Machines Used in Experimental Results Machine CPU GPU RAM Laptop i9-12900HX RTX A2000 32GB Desktop Threadripper PRO 5955WX RTX 4090 128GB Server Intel Xeon Silver 4114 None 128GB
Software Dependencies No The paper mentions 'cleanrl [10] implementation of PPO [31]' but does not specify version numbers for cleanrl, PPO, or other general software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes In Table 4 we summarize the trigger pattern, poisoning budget, target action, and values of clow and chigh used in each environment.