Simplified Temporal Consistency Reinforcement Learning

Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods sample efficiency while training 2.4 faster.
Researcher Affiliation Academia Yi Zhao 1 Wenshuai Zhao 1 Rinu Boney 2 Juho Kannala 2 Joni Pajarinen 1 1Department of Electrical Engineering and Automation, Aalto University, Finland 2Department of Computer Science, Aalto University, Finland.
Pseudocode No The paper describes the methodology using text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code Yes Source Code of TCRL: https://github.com/zhaoyi11/tcrl
Open Datasets Yes We evaluate our TCRL approach in several continuous DMC control tasks. DMC uses scaled rewards... (Tunyasuvunakool et al., 2020)
Dataset Splits No The paper uses standard benchmark environments (DMC) but does not explicitly provide specific details (percentages, counts, or methodology) for how data is partitioned into training, validation, and test sets for their experiments.
Hardware Specification Yes We evaluate the training time on a single RTX 2080Ti GPU.
Software Dependencies No The paper states: "We re-implement Pla Net using Pytorch (Paszke et al., 2019)." This mentions a software name and its originating paper but does not provide a specific version number for PyTorch (e.g., 1.9) or other software components with their versions.
Experiment Setup Yes Table 2. Important Hyperparameters used in TCRL and TCRL-dynamics. [...] Batch size 512 Learning rate 3e-4 Optimizer Adam Latent Dimension 100 (Dog, Humanoid) 50 (otherwise) Momentum coefficient (τ) 0.005 Discount (γ) 0.99 Rollout horizon (H) 5 Rollout discount 0.9 N-step TD 3 Reward coefficient 1 Temporal coefficient 1