SHINE: Shielding Backdoors in Deep Reinforcement Learning

Authors: Zhuowen Yuan, Wenbo Guo, Jinyuan Jia, Bo Li, Dawn Song

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further conduct extensive experiments that evaluate SHINE against three mainstream DRL backdoor attacks in various benchmark RL environments. Our results show that SHINE significantly outperforms existing defenses in mitigating these backdoor attacks.
Researcher Affiliation Academia 1University of Illinois Urbana-Champaign 2University of California, Santa Barbara 3Pennsylvania State University 4University of Chicago 5University of California Berkeley.
Pseudocode Yes Algorithm 1 shows our final backdoor shielding algorithm.
Open Source Code No The paper does not include an unambiguous statement that the code for the described methodology is publicly available, nor does it provide a direct link to a code repository.
Open Datasets Yes We follow Trojdrl and select three Atari games from the Open AI Gym (Brockman et al., 2016) environment pool Pong, Breakout, and Space Invaders.
Dataset Splits No The paper describes collecting trajectories and retraining agents, but it does not specify explicit training/validation/test dataset splits with percentages or counts for reproduction.
Hardware Specification Yes On average, the trigger detection stage of SHINE takes 12 hours, and the retraining stage takes 5 hours on a single NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions using 'pytorch', 'gpytorch', and 'stable-baseline' but does not specify their version numbers.
Experiment Setup Yes The key hyper-parameters introduced by our method are the weight of the elastic-net regularization term in the feature-level explanation λ and the strength of the KL constraint in the policy retraining ϵ. We set λ to 1e-4 and ϵ to 0.01.