Policy Smoothing for Provably Robust Reinforcement Learning

Authors: Aounon Kumar, Alexander Levine, Soheil Feizi

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on various environments like Cartpole, Pong, Freeway and Mountain Car show that our method can yield meaningful robustness guarantees in practice.
Researcher Affiliation Academia Aounon Kumar, Alexander Levine, Soheil Feizi Department of Computer Science University of Maryland College Park, USA aounon@umd.edu,{alevine0,sfeizi}@cs.umd.edu
Pseudocode Yes Algorithm 1: Empirical Attack on DQN Agents
Open Source Code Yes We supplement our work with accompanying code for reproducing the experimental results, as well as pre-trained models for a selection of the experiments. Details about setting hyper-parameters and the environments we test are included in the appendix.
Open Datasets Yes We tested on four standard environments: the classical cortrol problems Cartpole and Mountain Car and the Atari games Pong and Freeway.
Dataset Splits Yes Table 1: Training Hyperparameters for DQN models. Validation interval (steps) 100000 Validation episodes 100
Hardware Specification Yes Each experiment is run on an NVIDIA 2080 Ti GPU.
Software Dependencies No We use DQN and DDPG implementations from the popular stable-baselines3 package (Raffin et al., 2019)
Experiment Setup Yes Table 1: Training Hyperparameters for DQN models.