reproducibilityindex.ai

Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning

Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Furong Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments and Discussion
Researcher Affiliation	Academia	Yongyuan Liang Yanchao Sun Ruijie Zheng Furong Huang Shanghai AI Lab, University of Maryland, College Park cheryll Liang@outlook.com {ycs,rzheng12,furongh}@umd.edu
Pseudocode	Yes	The pseudocodes of Woca R-PPO and Woca R-DQN are illustrated in Appendix C.2 and Appendix C.3.
Open Source Code	Yes	The code of this work is available at https://github.com/umd-huang-lab/Woca R-RL.
Open Datasets	Yes	Environments. Following most prior works [54, 52, 33] and the released implementation, we apply our Woca R-RL to PPO [39] on 4 Mu Jo Co tasks with continuous action spaces, including Hopper, Walker2d, Halfcheetah and Ant, and to DQN [32] agents on 4 Atari games including Pong, Freeway, Bank Heist and Road Runner, which have high dimensional pixel inputs and discrete action spaces.
Dataset Splits	No	The paper describes training and testing procedures but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproducibility, which is common for dynamic RL environments.
Hardware Specification	Yes	All experiments are conducted on 8 NVIDIA 2080 Ti GPUs for MuJoCo environments and 4 NVIDIA 2080 Ti GPUs for Atari environments.
Software Dependencies	No	The paper mentions base DRL algorithms like PPO [39] and DQN [32], and a toolbox auto_Li RPA [51], but it does not specify concrete version numbers for any software dependencies.
Experiment Setup	Yes	More implementation and hyperparameter details are provided in Appendix D.1.