reproducibilityindex.ai

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Authors: Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, Alexander Lerchner

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of DARLA on different task and environment setups that probe subtly different aspects of domain adaptation. As a reminder, in Sec. 2.2 we deﬁned ˆS as a state space that contains all possible conjunctions of high-level factors of variation necessary to generate any naturalistic observation in any Di 2 M. During domain adaptation scenarios agent observation states are generated according to so S Sim S(ˆs S) and so T Sim T(ˆs T) for the source and target domains respectively, where ˆs S and ˆs T are sampled by some distributions or processes GS and GT according to ˆs S GS( ˆS) and ˆs T GT ( ˆS).
Researcher Affiliation	Industry	1Deep Mind, 6 Pancras Square, Kings Cross, London, N1C 4AG, UK.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We use Deep Mind Lab (Beattie et al., 2016) to test a version of domain adaptation setup... We use the Jaco arm with a matching Mu Jo Co simulation environment (Todorov et al., 2012) in two domain adaptation scenarios: simulation to simulation (sim2sim) and simulation to reality (sim2real).
Dataset Splits	No	The paper describes the use of source and target domains for training and testing, respectively, but does not provide specific numerical percentages or counts for training, validation, and test splits typically associated with dataset partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions various RL algorithms (DQN, A3C, Episodic Control) and models (β-VAE, DAE) but does not specify their version numbers or other software dependencies with version details.
Experiment Setup	Yes	The disentangled model utilised a signiﬁcantly higher value of the hyperparameter β than the entangled model (see Appendix A.3 for further details), which constrains the ca- pacity of the latent channel.