DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Authors: Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, Alexander Lerchner
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of DARLA on different task and environment setups that probe subtly different aspects of domain adaptation. As a reminder, in Sec. 2.2 we defined ˆS as a state space that contains all possible conjunctions of high-level factors of variation necessary to generate any naturalistic observation in any Di 2 M. During domain adaptation scenarios agent observation states are generated according to so S Sim S(ˆs S) and so T Sim T(ˆs T) for the source and target domains respectively, where ˆs S and ˆs T are sampled by some distributions or processes GS and GT according to ˆs S GS( ˆS) and ˆs T GT ( ˆS). |
| Researcher Affiliation | Industry | 1Deep Mind, 6 Pancras Square, Kings Cross, London, N1C 4AG, UK. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use Deep Mind Lab (Beattie et al., 2016) to test a version of domain adaptation setup... We use the Jaco arm with a matching Mu Jo Co simulation environment (Todorov et al., 2012) in two domain adaptation scenarios: simulation to simulation (sim2sim) and simulation to reality (sim2real). |
| Dataset Splits | No | The paper describes the use of source and target domains for training and testing, respectively, but does not provide specific numerical percentages or counts for training, validation, and test splits typically associated with dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions various RL algorithms (DQN, A3C, Episodic Control) and models (β-VAE, DAE) but does not specify their version numbers or other software dependencies with version details. |
| Experiment Setup | Yes | The disentangled model utilised a significantly higher value of the hyperparameter β than the entangled model (see Appendix A.3 for further details), which constrains the ca- pacity of the latent channel. |