reproducibilityindex.ai

MOReL: Model-Based Offline Reinforcement Learning

Authors: Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we show that MORe L matches or exceeds state-of-the-art results in widely studied ofﬂine RL benchmarks.
Researcher Affiliation	Collaboration	Rahul Kidambi Cornell University, Ithaca rkidambi@cornell.edu Aravind Rajeswaran University of Washington, Seattle Google Research, Brain Team aravraj@cs.washington.edu Praneeth Netrapalli Microsoft Research, India praneeth@microsoft.com Thorsten Joachims Cornell University, Ithaca tj@cornell.edu
Pseudocode	Yes	Algorithm 1 MORe L: Model Based Ofﬂine Reinforcement Learning
Open Source Code	No	Project webpage: https://sites.google.com/view/morel (The webpage states 'Code coming soon!' indicating it was not available at the time of publication.)
Open Datasets	Yes	The tasks considered include Hopper-v2, Half Cheetah-v2, Ant-v2, and Walker2d-v2, which are illustrated in Figure 2. We consider ﬁve different logged data-sets for each environment, totalling 20 environment-dataset combinations. Datasets are collected based on the work of Wu et al. [18], with each dataset containing the equivalent of 1 million timesteps of environment interaction.
Dataset Splits	No	The paper mentions using a 'static dataset of interactions' but does not specify training, validation, or test splits for this dataset in the traditional sense, as policies are evaluated via rollouts in the environment.
Hardware Specification	No	The paper mentions 'computing resources from the Cornell Graphite cluster' but does not provide specific details on CPU, GPU, or memory used for the experiments.
Software Dependencies	No	The paper mentions software like Open AI gym [73], Mu Jo Co [74], and Adam [68] optimizer, but does not provide specific version numbers for these or other libraries/frameworks.
Experiment Setup	No	The paper mentions using a '2-layer ReLU-MLPs' for dynamics models, a '2-layer tanh-MLP' for the policy, an 'ensemble of 4 dynamics models', and that results are averaged over '5 different random seeds' using 'the same hyperparameters'. However, it does not provide specific hyperparameter values like learning rate, batch size, or optimizer settings.