reproducibilityindex.ai

Information asymmetry in KL-regularized RL

Authors: Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results in both discrete and continuous action domains and demonstrate that, for certain tasks, learning a default policy alongside the policy can signiﬁcantly speed up and improve learning.
Researcher Affiliation	Industry	Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess Deep Mind London, UK {agalashov,sidmj,leonardh,dhruvat,schwarzjn,gdesjardins, lejlot,ywteh,razp,heess}@google.com
Pseudocode	Yes	In algorithm 1 we provide pseudo-code for actor-critic version of the algorithm with K-step returns. and Algorithm 2 is an off-policy version with retraced Q function of the initial algorithm 1.
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper refers to the DMLab-30 set of environments, citing Beattie et al. (2016), which describes the environment itself rather than providing concrete access information (link, DOI, specific repository, or citation to a downloadable dataset) for a fixed dataset used in training. The continuous control experiments also use simulated environments, not external datasets.
Dataset Splits	No	The paper does not provide specific percentages or sample counts for training, validation, or test dataset splits, nor does it reference predefined splits with explicit citations for data partitioning.
Hardware Specification	No	The paper mentions a distributed actor-learner architecture with a varying number of actors but does not specify any particular hardware details such as GPU models, CPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions various algorithms and models used (e.g., SVG(0), V-trace, ResNet, LSTM) but does not provide specific version numbers for any software dependencies or libraries required for replication.
Experiment Setup	Yes	The paper provides detailed hyperparameter settings in Appendix D.2, including actor and critic learning rates, network sizes, batch size, unroll length, entropy bonus, and regularization constants for various tasks.