DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Authors: Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, Alexander Lerchner

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of DARLA on different task and environment setups that probe subtly different aspects of domain adaptation. As a reminder, in Sec. 2.2 we defined ˆS as a state space that contains all possible conjunctions of high-level factors of variation necessary to generate any naturalistic observation in any Di 2 M. During domain adaptation scenarios agent observation states are generated according to so S Sim S(ˆs S) and so T Sim T(ˆs T) for the source and target domains respectively, where ˆs S and ˆs T are sampled by some distributions or processes GS and GT according to ˆs S GS( ˆS) and ˆs T GT ( ˆS).
Researcher Affiliation Industry 1Deep Mind, 6 Pancras Square, Kings Cross, London, N1C 4AG, UK.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We use Deep Mind Lab (Beattie et al., 2016) to test a version of domain adaptation setup... We use the Jaco arm with a matching Mu Jo Co simulation environment (Todorov et al., 2012) in two domain adaptation scenarios: simulation to simulation (sim2sim) and simulation to reality (sim2real).
Dataset Splits No The paper describes the use of source and target domains for training and testing, respectively, but does not provide specific numerical percentages or counts for training, validation, and test splits typically associated with dataset partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions various RL algorithms (DQN, A3C, Episodic Control) and models (β-VAE, DAE) but does not specify their version numbers or other software dependencies with version details.
Experiment Setup Yes The disentangled model utilised a significantly higher value of the hyperparameter β than the entangled model (see Appendix A.3 for further details), which constrains the ca- pacity of the latent channel.