Simplified Temporal Consistency Reinforcement Learning
Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods sample efficiency while training 2.4 faster. |
| Researcher Affiliation | Academia | Yi Zhao 1 Wenshuai Zhao 1 Rinu Boney 2 Juho Kannala 2 Joni Pajarinen 1 1Department of Electrical Engineering and Automation, Aalto University, Finland 2Department of Computer Science, Aalto University, Finland. |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Source Code of TCRL: https://github.com/zhaoyi11/tcrl |
| Open Datasets | Yes | We evaluate our TCRL approach in several continuous DMC control tasks. DMC uses scaled rewards... (Tunyasuvunakool et al., 2020) |
| Dataset Splits | No | The paper uses standard benchmark environments (DMC) but does not explicitly provide specific details (percentages, counts, or methodology) for how data is partitioned into training, validation, and test sets for their experiments. |
| Hardware Specification | Yes | We evaluate the training time on a single RTX 2080Ti GPU. |
| Software Dependencies | No | The paper states: "We re-implement Pla Net using Pytorch (Paszke et al., 2019)." This mentions a software name and its originating paper but does not provide a specific version number for PyTorch (e.g., 1.9) or other software components with their versions. |
| Experiment Setup | Yes | Table 2. Important Hyperparameters used in TCRL and TCRL-dynamics. [...] Batch size 512 Learning rate 3e-4 Optimizer Adam Latent Dimension 100 (Dog, Humanoid) 50 (otherwise) Momentum coefficient (τ) 0.005 Discount (γ) 0.99 Rollout horizon (H) 5 Rollout discount 0.9 N-step TD 3 Reward coefficient 1 Temporal coefficient 1 |