Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning
Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Furong Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments and Discussion |
| Researcher Affiliation | Academia | Yongyuan Liang Yanchao Sun Ruijie Zheng Furong Huang Shanghai AI Lab, University of Maryland, College Park cheryll Liang@outlook.com {ycs,rzheng12,furongh}@umd.edu |
| Pseudocode | Yes | The pseudocodes of Woca R-PPO and Woca R-DQN are illustrated in Appendix C.2 and Appendix C.3. |
| Open Source Code | Yes | The code of this work is available at https://github.com/umd-huang-lab/Woca R-RL. |
| Open Datasets | Yes | Environments. Following most prior works [54, 52, 33] and the released implementation, we apply our Woca R-RL to PPO [39] on 4 Mu Jo Co tasks with continuous action spaces, including Hopper, Walker2d, Halfcheetah and Ant, and to DQN [32] agents on 4 Atari games including Pong, Freeway, Bank Heist and Road Runner, which have high dimensional pixel inputs and discrete action spaces. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproducibility, which is common for dynamic RL environments. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA 2080 Ti GPUs for MuJoCo environments and 4 NVIDIA 2080 Ti GPUs for Atari environments. |
| Software Dependencies | No | The paper mentions base DRL algorithms like PPO [39] and DQN [32], and a toolbox auto_Li RPA [51], but it does not specify concrete version numbers for any software dependencies. |
| Experiment Setup | Yes | More implementation and hyperparameter details are provided in Appendix D.1. |