Towards a better understanding of representation dynamics under TD-learning

Authors: Yunhao Tang, Remi Munos

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate this theoretical insight with tabular and deep RL experiments over Atari game suites.
Researcher Affiliation Industry 1Google Deep Mind. Correspondence to: Yunhao Tang <robintyh@deepmind.com>.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide a statement about releasing its own source code, nor does it provide a link to a code repository for its methodology.
Open Datasets Yes We use DQN (Mnih et al., 2013) as a baseline, and generate random reward functions Rπ i (x, a) via outputs of randomly initialized networks, following the practice of (Dabney et al., 2021). [...] Our testbed is a subset of 15 Atari games (Bellemare et al., 2013) on which it has been shown that DQN can achieve reasonable performance
Dataset Splits No The paper mentions 'validation' in the context of value approximation error decay (Figure 1), but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing sets in its empirical deep RL experiments. While hyperparameters were tuned, the method for validation split is not described.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies Yes All results are based on the solving the exact ODE dynamics, using the Scipy ODE solver (Virtanen et al., 2020).
Experiment Setup Yes We tune the learning rate η {0.00025, 0.0001, 0.00005} as suggested in (Dabney et al., 2021). The default DQN uses η = 0.00025. We find that at η = 0.0001 the tuned DQN performs the best. For the auxiliary task, we tune the number of random rewards h {4, 16, 64, 256}. We find that h = 16 performs slightly better than other alternatives.