SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents
Authors: Ethan Rathbun, Christopher Amato, Alina Oprea
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return. |
| Researcher Affiliation | Academia | Ethan Rathbun , Christopher Amato , Alina Oprea Khoury College of Computer Sciences, Northeastern University |
| Pseudocode | Yes | Algorithm 1 The Sleeper Nets Attack |
| Open Source Code | Yes | Code is attached to the paper submission and provided anonymously here. |
| Open Datasets | Yes | We evaluate each method on a suite of 6 diverse environments against agents trained using the cleanrl [10] implementation of PPO [31]. First, to replicate and validate the results of [4] and [14] we test all attacks on Atari Breakout and Qbert from the Atari gymnasium suite [1]. In our evaluation we found that these environments are highly susceptible to backdoor poisoning attacks, thus we extend and focus our study towards the following 4 environments: Car Racing from the Box2D gymnasium [1], Safety Car from Safety Gymnasium [12], Highway Merge from Highway Env [18], and Trading BTC from Gym Trading Env [27]. |
| Dataset Splits | No | The paper does not explicitly mention using a 'validation' dataset split for hyperparameter tuning or model selection in the context of data partitioning, though PPO is used which handles this differently. |
| Hardware Specification | Yes | Machines Used in Experimental Results Machine CPU GPU RAM Laptop i9-12900HX RTX A2000 32GB Desktop Threadripper PRO 5955WX RTX 4090 128GB Server Intel Xeon Silver 4114 None 128GB |
| Software Dependencies | No | The paper mentions 'cleanrl [10] implementation of PPO [31]' but does not specify version numbers for cleanrl, PPO, or other general software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | In Table 4 we summarize the trigger pattern, poisoning budget, target action, and values of clow and chigh used in each environment. |