TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
Authors: Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Tree QN and ATree C outperform n-step DQN and A2C on a box-pushing task, as well as n-step DQN and value prediction networks (Oh et al., 2017) on multiple Atari games. Furthermore, we present ablation studies that demonstrate the effect of different auxiliary losses on learning transition models. |
| Researcher Affiliation | Academia | Gregory Farquhar1 gregory.farquhar@cs.ox.ac.uk Tim Rockt aschel1 tim.rocktaschel@cs.ox.ac.uk Maximilian Igl1 maximilian.igl@cs.ox.ac.uk Shimon Whiteson1 shimon.whiteson@cs.ox.ac.uk 1University of Oxford, United Kingdom |
| Pseudocode | No | The paper describes the algorithms in prose and mathematical formulations but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our implementations are based on Open AI Baselines (Hesse et al., 2017). |
| Open Datasets | Yes | Atari. To demonstrate the general applicability of Tree QN and ATree C to complex environments, we evaluate them on the Atari 2600 suite (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes using the Atari 2600 suite and a custom box-pushing environment but does not specify explicit train/validation/test dataset splits. For the box-pushing environment, levels are randomly generated for each episode, and for Atari, no specific dataset splits are mentioned. |
| Hardware Specification | Yes | The NVIDIA DGX-1 used for this research was donated by the NVIDIA Corporation. |
| Software Dependencies | No | The paper mentions that 'Our implementations are based on Open AI Baselines (Hesse et al., 2017)' and that 'All experiments use RMSProp', but it does not specify version numbers for any software libraries or frameworks. |
| Experiment Setup | Yes | Full details of the experimental setup, as well as architecture and training hyperparameters, are given in the appendix. For instance, 'All experiments use RMSProp (Tieleman & Hinton, 2012) with a learning rate of 1e-4, a decay of α = 0.99, and ϵ = 1e-5.' and 'We use nsteps = 5 and nenvs = 16, for a total batch size of 80.' |