reproducibilityindex.ai

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Authors: Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on 10 environments ranging from Atari games with discrete actions to complex control tasks in continuous action space. Our proposed method signiﬁcantly improves robustness under strong white-box attacks on state observations, including two strong attacks we design, the robust Sarsa attack (RS attack) and maximal action difference attack (MAD attack).
Researcher Affiliation	Collaboration	Huan Zhang,1 Hongge Chen,2 Chaowei Xiao3 Bo Li4 Mingyan Liu5 Duane Boning2 Cho-Jui Hsieh1 1UCLA 2 MIT 3NVIDIA 4UIUC 5University of Michigan
Pseudocode	Yes	We show the full SA-DDPG algorithm in Appendix G.1. ... We defer the details on solving RDQN(θ) and full SA-DQN algorithm to Appendix H.
Open Source Code	Yes	Our code is available at https://github.com/chenhongge/State Adv DRL.
Open Datasets	Yes	We use the PPO implementation from [14], which conducted hyperparameter search and published the optimal hyperparameters for PPO on three Mujoco environments in Open AI Gym [7]. ... We implement Double DQN [72] and Prioritized Experience Replay [58] on four Atari games.
Dataset Splits	No	The paper uses standard RL environments like Mujoco and Atari. While it mentions hyperparameter tuning (e.g., for κPPO and κDDPG), it does not specify explicit dataset splits (e.g., percentages or sample counts for training, validation, and test sets) in the way a supervised learning paper would. RL typically uses episodes/steps for training and evaluation without formal dataset partitions.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU or CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper mentions using implementations like PPO from [14], DDPG from [61], Double DQN [72], Prioritized Experience Replay [58], and auto_LiRPA [83], but it does not specify concrete version numbers for these software dependencies (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	We use their optimal hyperparameters for PPO, and the same set of hyperparameters for SA-PPO without further tuning. ... SA-PPO has one additional regularization parameter, κPPO, for the regularizer RPPO, which is chosen in {0.003, 0.01, 0.03, 0.1, 0.3, 1.0}. ... We run Walker2d and Hopper 2 10^6 steps and Humanoid 1 10^7 steps to ensure convergence. ... We train Atari agents for 6 million frames for both vanilla DQN and SA-DQN. ... Detailed hyperparameters are in Appendix F.