reproducibilityindex.ai

CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

Authors: Fan Wu, Linyi Li, Zijian Huang, Yevgeniy Vorobeychik, Ding Zhao, Bo Li

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we apply CROP to evaluate several existing empirically robust RL algorithms, including adversarial training and different robust regularization, in four environments (two representative Atari games, Highway, and Cart Pole). Furthermore, by evaluating these algorithms against adversarial attacks, we demonstrate that our certiﬁcations are often tight. All experiment results are available at website https://crop-leaderboard.github.io.
Researcher Affiliation	Academia	Fan Wu1 Linyi Li1 Zijian Huang1 Yevgeniy Vorobeychik2 Ding Zhao3 Bo Li1 1University of Illinois at Urbana-Champaign 2Washington University in St. Louis 3Carnegie Mellon University
Pseudocode	Yes	Algorithm 1: CROP-LOACT: Local smoothing for certifying per-state action, Algorithm 2: CROP-GRE: Global smoothing for certifying cumulative reward, Algorithm 3: CROP-LORE: Adaptive search for certifying cumulative reward
Open Source Code	Yes	Our implementation is publicly available at https://github.com/AI-secure/CROP.
Open Datasets	Yes	We experiment with two Atari-2600 environments in Open AI Gym (Brockman et al., 2016) on top of the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits	No	The paper describes experimental setups (e.g., 'We report results averaged over 10 episodes and set the length of the horizon H = 500.') and evaluation procedures, but it does not specify explicit training/validation/test dataset splits as typically found in supervised learning, which is not usually applicable in RL environments.
Hardware Specification	Yes	Our experiments are conducted on GPU machines, including Ge Force RTX 3090, Ge Force RTX 2080 Ti, and Ge Force RTX 1080 Ti.
Software Dependencies	No	The paper mentions software components and frameworks like 'Open AI Gym', 'Arcade Learning Environment', 'Double DQN', and 'Prioritized Experience Replay', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Evaluation Setup for CROP-LOACT. We report results averaged over 10 episodes and set the length of the horizon H = 500. At each time step, we sample m = 10, 000 noisy states for smoothing. When applying Hoeffding s inequality, we adopt the conﬁdence level parameter α = 0.05. Since the input state observations for the two Atari games are in image space, we rescale the input states such that each pixel falls into the range [0, 1]. When adding Gaussian noise to the rescaled states, we sample Gaussian noise of zero mean and different variances. Concretely, the standard deviation σ is selected among {0.001, 0.005, 0.01, 0.03, 0.05, 0.1, 0.5, 0.75, 1.0, 1.5, 2.0, 4.0}.