reproducibilityindex.ai

Safe Reinforcement Learning in Constrained Markov Decision Processes

Authors: Akifumi Wachi, Yanan Sui

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openlyavailable environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.
Researcher Affiliation	Collaboration	1IBM Research AI, Tokyo, Japan 2Tsinghua University, Beijing, China. Correspondence to: Akifumi Wachi <akifumi.wachi@ibm.com>, Yanan Sui <ysui@tsinghua.edu.cn>.
Pseudocode	Yes	Algorithm 1 SNO-MDP with ES2
Open Source Code	Yes	We build an openly-available test-bed called GP-SAFETY-GYM for synthetic experiments.1 The safety and efﬁciency of SNO-MDP are then evaluated with two experiments: one in the GP-SAFETY-GYM synthetic environment, and the other using real Mars terrain data. 1https://github.com/akifumi-wachi-4/safe_ near_optimal_mdp
Open Datasets	No	The paper mentions using a "synthetic data in a new, openly-available environment named GP-SAFETY-GYM" and "real observation data" for Mars surface exploration. It also states "We created a 40 x 30 rectangular grid-world by clipping a region around latitude 30 6 south and longitude 202 2 east, as shown in Figure 4." However, it does not provide specific access information (link, DOI, citation with authors/year) for these datasets to confirm public availability.
Dataset Splits	No	The paper mentions using a 20x20 square grid for synthetic data and a 40x30 rectangular grid-world for Mars data, but does not provide specific train/validation/test splits, percentages, or absolute counts for the datasets used.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions using "Open AI Safety-Gym" as a basis for their GP-SAFETY-GYM environment and specifies "Gaussian processes (GPs, see Rasmussen (2004))" and "Matérn kernel with ν = 5/2", "RBF kernel". However, it does not provide specific version numbers for any software components or libraries.
Experiment Setup	Yes	In this simulation, we allowed the agent to observe the reward and safety function values of the current state and neighboring states. The kernel for reward was a radial basis function (RBF) with the length-scales of 2 and prior variance of 1. The kernel for safety was also an RBF with the length-scales of 2 and prior variance of 1. Finally, we set the discount factor to γ = 0.99, and conﬁdence intervals parameters to αt = 3 and βt = 2 for all t 1. ... We set the conﬁdence levels as αt = 3 and βt = 2, t ≥ 0, and the discount factor as γ = 0.9.