MOReL: Model-Based Offline Reinforcement Learning
Authors: Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we show that MORe L matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. |
| Researcher Affiliation | Collaboration | Rahul Kidambi Cornell University, Ithaca rkidambi@cornell.edu Aravind Rajeswaran University of Washington, Seattle Google Research, Brain Team aravraj@cs.washington.edu Praneeth Netrapalli Microsoft Research, India praneeth@microsoft.com Thorsten Joachims Cornell University, Ithaca tj@cornell.edu |
| Pseudocode | Yes | Algorithm 1 MORe L: Model Based Offline Reinforcement Learning |
| Open Source Code | No | Project webpage: https://sites.google.com/view/morel (The webpage states 'Code coming soon!' indicating it was not available at the time of publication.) |
| Open Datasets | Yes | The tasks considered include Hopper-v2, Half Cheetah-v2, Ant-v2, and Walker2d-v2, which are illustrated in Figure 2. We consider five different logged data-sets for each environment, totalling 20 environment-dataset combinations. Datasets are collected based on the work of Wu et al. [18], with each dataset containing the equivalent of 1 million timesteps of environment interaction. |
| Dataset Splits | No | The paper mentions using a 'static dataset of interactions' but does not specify training, validation, or test splits for this dataset in the traditional sense, as policies are evaluated via rollouts in the environment. |
| Hardware Specification | No | The paper mentions 'computing resources from the Cornell Graphite cluster' but does not provide specific details on CPU, GPU, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions software like Open AI gym [73], Mu Jo Co [74], and Adam [68] optimizer, but does not provide specific version numbers for these or other libraries/frameworks. |
| Experiment Setup | No | The paper mentions using a '2-layer ReLU-MLPs' for dynamics models, a '2-layer tanh-MLP' for the policy, an 'ensemble of 4 dynamics models', and that results are averaged over '5 different random seeds' using 'the same hyperparameters'. However, it does not provide specific hyperparameter values like learning rate, batch size, or optimizer settings. |