Model-Based Reinforcement Learning via Latent-Space Collocation

Authors: Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate long-horizon planning capabilities of Lat Co for model-based reinforcement learning on several challenging manipulation and locomotion tasks. Each subsection below corresponds to a distinct scientific question that we study. We evaluate Lat Co s performance in the standard model-based reinforcement learning setting, where the agent learns the model from scratch and collects the data online using the Lat Co planner, according to Algorithm 1. We evaluate the deterministic Lat Co and other model-based agents on the Sparse Meta World tasks from visual input without any state information. The performance is shown in Table 1 and learning curves in the Appendix.
Researcher Affiliation Collaboration Oleh Rybkin * 1 Chuning Zhu * 1 Anusha Nagabandi 2 Kostas Daniilidis 1 Igor Mordatch 3 Sergey Levine 4 *Equal contribution 1University of Pennsylvania 2Covariant 3Google AI 4UC Berkeley. Correspondence to: Oleh Rybkin <oleh@seas.upenn.edu>.
Pseudocode Yes Algorithm 1 Latent Collocation (Lat Co)
Open Source Code No See the videos on the supplementary website https: //orybkin.github.io/latco/. The paper explicitly states the website is for videos, not for source code release.
Open Datasets Yes To evaluate on challenging visual planning tasks, we adapt the Meta World benchmark (Yu et al., 2020) to visual observations and sparse rewards... In addition, we evaluate on the standard continuous control tasks with shaped rewards from the Deep Mind Control Suite (Tassa et al., 2020).
Dataset Splits No The paper describes training parameters and protocols (e.g., 'Ttot = 150', 'action repeat of 2'), and uses standard benchmarks, but does not explicitly specify train/validation/test dataset splits with percentages or counts for reproduction.
Hardware Specification No The paper does not mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions general software components like 'convolutional neural networks' and 'Adam optimizer', but does not provide specific version numbers for libraries, frameworks, or programming languages used.
Experiment Setup Yes We use H = 30, Tcache = 30, Ttot = 150 for all tasks except Pushing, Thermos, and Hammer, where we use H = 50, Tcache = 25, Ttot = 150. In addition, we evaluate on the standard continuous control tasks with shaped rewards from the Deep Mind Control Suite (Tassa et al., 2020). According to the protocol from (Hafner et al., 2020), we use an action repeat of 2 and set H = 12, Tcache = 6, Ttot = 1000 for all DMC tasks. The hyperparameters are detailed in Appendix A.