Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Authors: Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
Researcher Affiliation Academia School of Computer Science and Engineering University of Washington {skainswo,mbarnes,siddh}@cs.washington.edu
Pseudocode Yes Algorithm 1 Resetting based on demonstrator trajectories
Open Source Code No The paper mentions 'our implementation of DDPG' and uses external libraries like JAX and Open AI Gym, but it does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described.
Open Datasets Yes We evaluate LEARNEDESTOP on a modified Frozen Lake-v0 environment from the Open AI gym. This environment is highly stochastic: for example, taking a left action can move the character either up, left, or down each with probability 1/3. Half Cheetah-v3 environment from the Open AI gym [7]
Dataset Splits No The paper describes using a 'discount factor of γ = 0.99' and replacing 'Half of the states with the lowest hitting probabilities' for experiments, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions 'JAX: composable transformations of Python+NumPy programs, 2018' and 'Open AI Gym, 2016', but it does not provide a comprehensive list of specific software dependencies with version numbers used for the experimental setup beyond these external tools.
Experiment Setup Yes To encourage the agent to reach the goal quickly, we use a discount factor of γ = 0.99. Half of the states with the lowest hitting probabilities were replaced with e-stops. For the sake of simplicity we implemented e-stops as min/max bounds on state values.