reproducibilityindex.ai

Towards a better understanding of representation dynamics under TD-learning

Authors: Yunhao Tang, Remi Munos

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate this theoretical insight with tabular and deep RL experiments over Atari game suites.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Yunhao Tang <robintyh@deepmind.com>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide a statement about releasing its own source code, nor does it provide a link to a code repository for its methodology.
Open Datasets	Yes	We use DQN (Mnih et al., 2013) as a baseline, and generate random reward functions Rπ i (x, a) via outputs of randomly initialized networks, following the practice of (Dabney et al., 2021). [...] Our testbed is a subset of 15 Atari games (Bellemare et al., 2013) on which it has been shown that DQN can achieve reasonable performance
Dataset Splits	No	The paper mentions 'validation' in the context of value approximation error decay (Figure 1), but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing sets in its empirical deep RL experiments. While hyperparameters were tuned, the method for validation split is not described.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	Yes	All results are based on the solving the exact ODE dynamics, using the Scipy ODE solver (Virtanen et al., 2020).
Experiment Setup	Yes	We tune the learning rate η {0.00025, 0.0001, 0.00005} as suggested in (Dabney et al., 2021). The default DQN uses η = 0.00025. We find that at η = 0.0001 the tuned DQN performs the best. For the auxiliary task, we tune the number of random rewards h {4, 16, 64, 256}. We find that h = 16 performs slightly better than other alternatives.