reproducibilityindex.ai

Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Authors: Akifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data. We evaluate our approach on two problem settings. One is with randomly generated environments on which we perform Monte-Carlo simulations, and the other is with environments created from real Martian terrain data.
Researcher Affiliation	Collaboration	Akifumi Wachi University of Tokyo wachi@space.t.u-tokyo.ac.jp Yanan Sui and Yisong Yue Calfornia Institute of Technology {ysui, yyue}@caltech.edu Masahiro Ono Jet Propulsion Laboratory, California Institute of Technology ono@jpl.nasa.gov
Pseudocode	Yes	Algorithm 1 SAFEEXPOPT-MDP 1: loop 2: Observe reward and safety values of the current state, and update GP models 3: Update Ssafe t , Suncertain t , and Sunsafe t 4: Compute ˆJN for Optimistic MDP and JN for Pessimistic MDP by Eq. (5) and Eq. (6) 5: Derive the optimal policy to maximize η ˆJN + (1 η) JN by Eq. (8) 6: Execute the next optimal action 7: end loop
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	The elevation map is derived from the digital terrain models (DEMs), created from Hi RISE camera on the Mars Reconnaissance Orbiter (Mc Ewen et al. 2007).
Dataset Splits	No	The paper refers to using datasets for simulations (randomly generated and Martian terrain data) and notes that 'GP hyper-parameters are obtained through trial and error.' However, it does not provide specific dataset split information (e.g., percentages, sample counts, or clear cross-validation setup) needed to reproduce the data partitioning into training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions algorithms and kernels used (e.g., GP with RBF kernel, Matern kernel, BEB algorithm) but does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	For each simulation, GP hyper-parameters are obtained through trial and error. As for the two Mars simulations, we refer to the previous work by (Turchetta, Berkenkamp, and Krause 2016) and use similar parameters. ... The exploring agent predicts the safety function g via GP with a Radial Basis Function (RBF) kernel with the lengthscales being 2.0 and the prior variance of safety being 1.5. ... The discount rate is set to 0.90 and beta is set as βt = 2, t 0. The Optimistic and Pessimistic MDPs are solved by the BEB algorithm with the weight coefﬁcient of 2.5 for the exploration bonus. ... The weight coefﬁcient between optimistic and pessimistic policies is set to η = 0.50.