Model-Based Reinforcement Learning via Latent-Space Collocation
Authors: Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate long-horizon planning capabilities of Lat Co for model-based reinforcement learning on several challenging manipulation and locomotion tasks. Each subsection below corresponds to a distinct scientiļ¬c question that we study. We evaluate Lat Co s performance in the standard model-based reinforcement learning setting, where the agent learns the model from scratch and collects the data online using the Lat Co planner, according to Algorithm 1. We evaluate the deterministic Lat Co and other model-based agents on the Sparse Meta World tasks from visual input without any state information. The performance is shown in Table 1 and learning curves in the Appendix. |
| Researcher Affiliation | Collaboration | Oleh Rybkin * 1 Chuning Zhu * 1 Anusha Nagabandi 2 Kostas Daniilidis 1 Igor Mordatch 3 Sergey Levine 4 *Equal contribution 1University of Pennsylvania 2Covariant 3Google AI 4UC Berkeley. Correspondence to: Oleh Rybkin <oleh@seas.upenn.edu>. |
| Pseudocode | Yes | Algorithm 1 Latent Collocation (Lat Co) |
| Open Source Code | No | See the videos on the supplementary website https: //orybkin.github.io/latco/. The paper explicitly states the website is for videos, not for source code release. |
| Open Datasets | Yes | To evaluate on challenging visual planning tasks, we adapt the Meta World benchmark (Yu et al., 2020) to visual observations and sparse rewards... In addition, we evaluate on the standard continuous control tasks with shaped rewards from the Deep Mind Control Suite (Tassa et al., 2020). |
| Dataset Splits | No | The paper describes training parameters and protocols (e.g., 'Ttot = 150', 'action repeat of 2'), and uses standard benchmarks, but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | No | The paper does not mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions general software components like 'convolutional neural networks' and 'Adam optimizer', but does not provide specific version numbers for libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We use H = 30, Tcache = 30, Ttot = 150 for all tasks except Pushing, Thermos, and Hammer, where we use H = 50, Tcache = 25, Ttot = 150. In addition, we evaluate on the standard continuous control tasks with shaped rewards from the Deep Mind Control Suite (Tassa et al., 2020). According to the protocol from (Hafner et al., 2020), we use an action repeat of 2 and set H = 12, Tcache = 6, Ttot = 1000 for all DMC tasks. The hyperparameters are detailed in Appendix A. |