reproducibilityindex.ai

Simplified Temporal Consistency Reinforcement Learning

Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods sample efficiency while training 2.4 faster.
Researcher Affiliation	Academia	Yi Zhao 1 Wenshuai Zhao 1 Rinu Boney 2 Juho Kannala 2 Joni Pajarinen 1 1Department of Electrical Engineering and Automation, Aalto University, Finland 2Department of Computer Science, Aalto University, Finland.
Pseudocode	No	The paper describes the methodology using text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Source Code of TCRL: https://github.com/zhaoyi11/tcrl
Open Datasets	Yes	We evaluate our TCRL approach in several continuous DMC control tasks. DMC uses scaled rewards... (Tunyasuvunakool et al., 2020)
Dataset Splits	No	The paper uses standard benchmark environments (DMC) but does not explicitly provide specific details (percentages, counts, or methodology) for how data is partitioned into training, validation, and test sets for their experiments.
Hardware Specification	Yes	We evaluate the training time on a single RTX 2080Ti GPU.
Software Dependencies	No	The paper states: "We re-implement Pla Net using Pytorch (Paszke et al., 2019)." This mentions a software name and its originating paper but does not provide a specific version number for PyTorch (e.g., 1.9) or other software components with their versions.
Experiment Setup	Yes	Table 2. Important Hyperparameters used in TCRL and TCRL-dynamics. [...] Batch size 512 Learning rate 3e-4 Optimizer Adam Latent Dimension 100 (Dog, Humanoid) 50 (otherwise) Momentum coefficient (τ) 0.005 Discount (γ) 0.99 Rollout horizon (H) 5 Rollout discount 0.9 N-step TD 3 Reward coefficient 1 Temporal coefficient 1