Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning

Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Furong Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments and Discussion
Researcher Affiliation Academia Yongyuan Liang Yanchao Sun Ruijie Zheng Furong Huang Shanghai AI Lab, University of Maryland, College Park cheryll Liang@outlook.com {ycs,rzheng12,furongh}@umd.edu
Pseudocode Yes The pseudocodes of Woca R-PPO and Woca R-DQN are illustrated in Appendix C.2 and Appendix C.3.
Open Source Code Yes The code of this work is available at https://github.com/umd-huang-lab/Woca R-RL.
Open Datasets Yes Environments. Following most prior works [54, 52, 33] and the released implementation, we apply our Woca R-RL to PPO [39] on 4 Mu Jo Co tasks with continuous action spaces, including Hopper, Walker2d, Halfcheetah and Ant, and to DQN [32] agents on 4 Atari games including Pong, Freeway, Bank Heist and Road Runner, which have high dimensional pixel inputs and discrete action spaces.
Dataset Splits No The paper describes training and testing procedures but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproducibility, which is common for dynamic RL environments.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA 2080 Ti GPUs for MuJoCo environments and 4 NVIDIA 2080 Ti GPUs for Atari environments.
Software Dependencies No The paper mentions base DRL algorithms like PPO [39] and DQN [32], and a toolbox auto_Li RPA [51], but it does not specify concrete version numbers for any software dependencies.
Experiment Setup Yes More implementation and hyperparameter details are provided in Appendix D.1.