Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning
Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Furong Huang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments and Discussion |
| Researcher Affiliation | Academia | Yongyuan Liang Yanchao Sun Ruijie Zheng Furong Huang Shanghai AI Lab, University of Maryland, College Park cheryll EMAIL EMAIL |
| Pseudocode | Yes | The pseudocodes of Woca R-PPO and Woca R-DQN are illustrated in Appendix C.2 and Appendix C.3. |
| Open Source Code | Yes | The code of this work is available at https://github.com/umd-huang-lab/Woca R-RL. |
| Open Datasets | Yes | Environments. Following most prior works [54, 52, 33] and the released implementation, we apply our Woca R-RL to PPO [39] on 4 Mu Jo Co tasks with continuous action spaces, including Hopper, Walker2d, Halfcheetah and Ant, and to DQN [32] agents on 4 Atari games including Pong, Freeway, Bank Heist and Road Runner, which have high dimensional pixel inputs and discrete action spaces. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproducibility, which is common for dynamic RL environments. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA 2080 Ti GPUs for MuJoCo environments and 4 NVIDIA 2080 Ti GPUs for Atari environments. |
| Software Dependencies | No | The paper mentions base DRL algorithms like PPO [39] and DQN [32], and a toolbox auto_Li RPA [51], but it does not specify concrete version numbers for any software dependencies. |
| Experiment Setup | Yes | More implementation and hyperparameter details are provided in Appendix D.1. |