reproducibilityindex.ai

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Authors: Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
Researcher Affiliation	Academia	School of Computer Science and Engineering University of Washington {skainswo,mbarnes,siddh}@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Resetting based on demonstrator trajectories
Open Source Code	No	The paper mentions 'our implementation of DDPG' and uses external libraries like JAX and Open AI Gym, but it does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described.
Open Datasets	Yes	We evaluate LEARNEDESTOP on a modiﬁed Frozen Lake-v0 environment from the Open AI gym. This environment is highly stochastic: for example, taking a left action can move the character either up, left, or down each with probability 1/3. Half Cheetah-v3 environment from the Open AI gym [7]
Dataset Splits	No	The paper describes using a 'discount factor of γ = 0.99' and replacing 'Half of the states with the lowest hitting probabilities' for experiments, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'JAX: composable transformations of Python+NumPy programs, 2018' and 'Open AI Gym, 2016', but it does not provide a comprehensive list of specific software dependencies with version numbers used for the experimental setup beyond these external tools.
Experiment Setup	Yes	To encourage the agent to reach the goal quickly, we use a discount factor of γ = 0.99. Half of the states with the lowest hitting probabilities were replaced with e-stops. For the sake of simplicity we implemented e-stops as min/max bounds on state values.