SHINE: Shielding Backdoors in Deep Reinforcement Learning
Authors: Zhuowen Yuan, Wenbo Guo, Jinyuan Jia, Bo Li, Dawn Song
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further conduct extensive experiments that evaluate SHINE against three mainstream DRL backdoor attacks in various benchmark RL environments. Our results show that SHINE significantly outperforms existing defenses in mitigating these backdoor attacks. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign 2University of California, Santa Barbara 3Pennsylvania State University 4University of Chicago 5University of California Berkeley. |
| Pseudocode | Yes | Algorithm 1 shows our final backdoor shielding algorithm. |
| Open Source Code | No | The paper does not include an unambiguous statement that the code for the described methodology is publicly available, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We follow Trojdrl and select three Atari games from the Open AI Gym (Brockman et al., 2016) environment pool Pong, Breakout, and Space Invaders. |
| Dataset Splits | No | The paper describes collecting trajectories and retraining agents, but it does not specify explicit training/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | Yes | On average, the trigger detection stage of SHINE takes 12 hours, and the retraining stage takes 5 hours on a single NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions using 'pytorch', 'gpytorch', and 'stable-baseline' but does not specify their version numbers. |
| Experiment Setup | Yes | The key hyper-parameters introduced by our method are the weight of the elastic-net regularization term in the feature-level explanation λ and the strength of the KL constraint in the policy retraining ϵ. We set λ to 1e-4 and ϵ to 0.01. |