reproducibilityindex.ai

Learning What To Do by Simulating the Past

Authors: David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Deep RLSP on Mu Jo Co environments and show that it can recover fairly good performance on the task reward given access to a small number of states sampled from a policy optimized for that reward.
Researcher Affiliation	Academia	David Lindner Department of Computer Science ETH Zurich david.lindner@inf.ethz.ch Rohin Shah, Pieter Abbeel & Anca Dragan Center for Human-Compatible AI UC Berkeley {rohinmshah,pabbeel,anca}@berkeley.edu
Pseudocode	Yes	Algorithm 1 The DEEP RLSP algorithm.
Open Source Code	Yes	We provide code to replicate our experiments at https://github.com/Human Compatible AI/deep-rlsp.
Open Datasets	No	The paper mentions generating its own data through "random rollouts" or "environment interactions" but does not provide access information (link, citation, repository) for a publicly available, pre-existing dataset that was used for training.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for its experiments. Data is primarily generated through rollouts and used for training models, rather than being split from a fixed dataset.
Hardware Specification	No	The paper mentions using the "Mu Jo Co physics simulator" and environments, but does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions several software components like "Tensor Flow framework", "Open AI Gym", "Soft Actor-Critic (SAC)", and "stable-baselines". However, it does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	The hyperparameters of our experiments are described in detail in Appendix B. For example, B.1 Feature Function: "The latent space has dimension 30." "trained for 100 epochs on 100 rollouts of a random policy in the environment. During training we use a batch size of 500 and a learning rate of 10-5." B.5 DEEP RLSP HYPERPARAMETERS: "learning rate of 0.01", "200 forward and backward trajectories", "algorithm until T = 10".