reproducibilityindex.ai

Learning State Representations via Retracing in Reinforcement Learning

Authors: Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical studies on visual-based continuous control benchmarks, we demonstrate that CCWM achieves state-of-the-art performance in terms of sample efﬁciency and asymptotic performance, whilst exhibiting behaviours that are indicative of stronger representation learning.
Researcher Affiliation	Collaboration	Changmin Yu1 , Dong Li2, Jianye Hao3, 2, Jun Wang1, 2, Neil Burgess1 1UCL, London, United Kingdom 2Huawei Noah s Ark Lab 3College of Intelligence and Computing, Tianjin University
Pseudocode	Yes	The pseudocode for CCWM training is shown in Algorithm 1.
Open Source Code	Yes	The python implementation of CCWM can be found at https://github.com/changmin-yu/ CCWM_code.
Open Datasets	Yes	We base our experimental studies on the challenging visual-based continuous control benchmarks for which we choose 8 tasks from the Deep Mind Control Suite (Tassa et al. (2018); Figure. 3a).
Dataset Splits	No	The paper mentions 'Greedy evaluation is performed every 104 training steps' and 'The reported evaluation scores are averaged values over 5 random seeds', but does not explicitly provide details about specific training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	No	The paper mentions the use of TensorFlow for implementation but does not provide any specific hardware details such as GPU models, CPU types, or cloud computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions the use of 'Tensor Flow' and 'Tensor Flow Distributions' but does not specify their version numbers, nor does it list other software dependencies with their versions.
Experiment Setup	Yes	For the actual training, the batch size is chosen to be 64, and all sampled trajectories are taken to be 50 timesteps long... The parameter λ controlling the weights of the retrace auxiliary loss in Eq. 8 is set to 1.0. The discounting factor for the expected value function is set to 0.99. The default values for the parameters we used for the empirical evaluation shown in Figure 5 are: η = 0.10, S = 10, τ = 5, ξ = 1 105.