Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
Authors: Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods. |
| Researcher Affiliation | Academia | School of Computer Science and Engineering University of Washington {skainswo,mbarnes,siddh}@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Resetting based on demonstrator trajectories |
| Open Source Code | No | The paper mentions 'our implementation of DDPG' and uses external libraries like JAX and Open AI Gym, but it does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We evaluate LEARNEDESTOP on a modified Frozen Lake-v0 environment from the Open AI gym. This environment is highly stochastic: for example, taking a left action can move the character either up, left, or down each with probability 1/3. Half Cheetah-v3 environment from the Open AI gym [7] |
| Dataset Splits | No | The paper describes using a 'discount factor of γ = 0.99' and replacing 'Half of the states with the lowest hitting probabilities' for experiments, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'JAX: composable transformations of Python+NumPy programs, 2018' and 'Open AI Gym, 2016', but it does not provide a comprehensive list of specific software dependencies with version numbers used for the experimental setup beyond these external tools. |
| Experiment Setup | Yes | To encourage the agent to reach the goal quickly, we use a discount factor of γ = 0.99. Half of the states with the lowest hitting probabilities were replaced with e-stops. For the sake of simplicity we implemented e-stops as min/max bounds on state values. |