reproducibilityindex.ai

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

Authors: Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Tree QN and ATree C outperform n-step DQN and A2C on a box-pushing task, as well as n-step DQN and value prediction networks (Oh et al., 2017) on multiple Atari games. Furthermore, we present ablation studies that demonstrate the effect of different auxiliary losses on learning transition models.
Researcher Affiliation	Academia	Gregory Farquhar1 gregory.farquhar@cs.ox.ac.uk Tim Rockt aschel1 tim.rocktaschel@cs.ox.ac.uk Maximilian Igl1 maximilian.igl@cs.ox.ac.uk Shimon Whiteson1 shimon.whiteson@cs.ox.ac.uk 1University of Oxford, United Kingdom
Pseudocode	No	The paper describes the algorithms in prose and mathematical formulations but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	Our implementations are based on Open AI Baselines (Hesse et al., 2017).
Open Datasets	Yes	Atari. To demonstrate the general applicability of Tree QN and ATree C to complex environments, we evaluate them on the Atari 2600 suite (Bellemare et al., 2013).
Dataset Splits	No	The paper describes using the Atari 2600 suite and a custom box-pushing environment but does not specify explicit train/validation/test dataset splits. For the box-pushing environment, levels are randomly generated for each episode, and for Atari, no specific dataset splits are mentioned.
Hardware Specification	Yes	The NVIDIA DGX-1 used for this research was donated by the NVIDIA Corporation.
Software Dependencies	No	The paper mentions that 'Our implementations are based on Open AI Baselines (Hesse et al., 2017)' and that 'All experiments use RMSProp', but it does not specify version numbers for any software libraries or frameworks.
Experiment Setup	Yes	Full details of the experimental setup, as well as architecture and training hyperparameters, are given in the appendix. For instance, 'All experiments use RMSProp (Tieleman & Hinton, 2012) with a learning rate of 1e-4, a decay of α = 0.99, and ϵ = 1e-5.' and 'We use nsteps = 5 and nenvs = 16, for a total batch size of 80.'