CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing
Authors: Fan Wu, Linyi Li, Zijian Huang, Yevgeniy Vorobeychik, Ding Zhao, Bo Li
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we apply CROP to evaluate several existing empirically robust RL algorithms, including adversarial training and different robust regularization, in four environments (two representative Atari games, Highway, and Cart Pole). Furthermore, by evaluating these algorithms against adversarial attacks, we demonstrate that our certifications are often tight. All experiment results are available at website https://crop-leaderboard.github.io. |
| Researcher Affiliation | Academia | Fan Wu1 Linyi Li1 Zijian Huang1 Yevgeniy Vorobeychik2 Ding Zhao3 Bo Li1 1University of Illinois at Urbana-Champaign 2Washington University in St. Louis 3Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1: CROP-LOACT: Local smoothing for certifying per-state action, Algorithm 2: CROP-GRE: Global smoothing for certifying cumulative reward, Algorithm 3: CROP-LORE: Adaptive search for certifying cumulative reward |
| Open Source Code | Yes | Our implementation is publicly available at https://github.com/AI-secure/CROP. |
| Open Datasets | Yes | We experiment with two Atari-2600 environments in Open AI Gym (Brockman et al., 2016) on top of the Arcade Learning Environment (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes experimental setups (e.g., 'We report results averaged over 10 episodes and set the length of the horizon H = 500.') and evaluation procedures, but it does not specify explicit training/validation/test dataset splits as typically found in supervised learning, which is not usually applicable in RL environments. |
| Hardware Specification | Yes | Our experiments are conducted on GPU machines, including Ge Force RTX 3090, Ge Force RTX 2080 Ti, and Ge Force RTX 1080 Ti. |
| Software Dependencies | No | The paper mentions software components and frameworks like 'Open AI Gym', 'Arcade Learning Environment', 'Double DQN', and 'Prioritized Experience Replay', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Evaluation Setup for CROP-LOACT. We report results averaged over 10 episodes and set the length of the horizon H = 500. At each time step, we sample m = 10, 000 noisy states for smoothing. When applying Hoeffding s inequality, we adopt the confidence level parameter α = 0.05. Since the input state observations for the two Atari games are in image space, we rescale the input states such that each pixel falls into the range [0, 1]. When adding Gaussian noise to the rescaled states, we sample Gaussian noise of zero mean and different variances. Concretely, the standard deviation σ is selected among {0.001, 0.005, 0.01, 0.03, 0.05, 0.1, 0.5, 0.75, 1.0, 1.5, 2.0, 4.0}. |