reproducibilityindex.ai

Guiding Safe Exploration with Weakest Preconditions

Authors: Greg Anderson, Swarat Chaudhuri, Isil Dillig

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations.
Researcher Affiliation	Academia	Greg Anderson, Swarat Chaudhuri , Isil Dillig , Department of Computer Science The University of Texas at Austin Austin, TX, USA {ganderso, swarat, isil}@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 The main learning algorithm
Open Source Code	Yes	1SPICE is available at https://github.com/gavlegoat/spice.
Open Datasets	Yes	We test SPICE using the benchmarks considered in Anderson et al. (2020). [...] Our experiments are taken from Anderson et al. (2020), and consist of 10 environments with continuous state and action spaces.
Dataset Splits	No	The paper describes how data is gathered during the reinforcement learning process (e.g., 'We gather real data for 10 episodes for each model update then collect data from 70 simulated episodes'), but it does not specify explicit train/validation/test dataset splits with percentages or counts.
Hardware Specification	No	The paper states that 'Compute resources for the experiments were provided by the Texas Advanced Computing Center' but does not specify any particular GPU, CPU models, or other hardware components.
Software Dependencies	No	The paper mentions software components like 'Py Earth (Rudy, 2013)', 'CVXOPT (Anderson et al., 2022)', 'MBPO (Janner et al., 2019)', and 'Soft Actor-Critic (Haarnoja etol., 2018a)', but does not provide specific version numbers for these software packages.
Experiment Setup	Yes	Further details of the benchmarks and hyperparameters are given in Appendix C. [...] We gather real data for 10 episodes for each model update then collect data from 70 simulated episodes before updating the environment model again. We look five time steps into the future during safety analysis. Our SAC implementation (adapted from Tandon (2018)) uses automatic entropy tuning as proposed in Haarnoja et al. (2018b). Each training process is cut off after 48 hours. We train each benchmark starting from nine distinct seeds.