Interference and Generalization in Temporal Difference Learning

Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3. Empirical Setup For the generalization experiments of Section 4 we loosely follow the setup of Zhang et al. (2018a): we train RL agents in environments where the initial state is induced by a single random seed, allowing us to have proper training and test sets in the form of mutually exclusive seeds. In particular, to allow for closer comparisons between RL and SL, we compare classifiers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009) to agents that learn to progressively explore a masked image (from those datasets) while attempting to classify it.
Researcher Affiliation Collaboration 1Mila, Mc Gill University 2Work partly done while the author was an intern at Deepmind 3Deepmind.
Pseudocode No The information is insufficient. The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes All code is available in the supplementary materials.
Open Datasets Yes In particular, to allow for closer comparisons between RL and SL, we compare classifiers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009)... As much of the existing deep learning literature on generalization focuses on classifiers, but estimating value functions is arguably closer to regression, we include two regression experiments using SARCOS (Vijayakumar & Schaal, 2000) and the California Housing dataset (Pace & Barry, 1997). Finally, for the interactive environment experiments of Section 5, we investigate some metrics on the popular Atari environment (Bellemare et al., 2013)...
Dataset Splits No The information is insufficient. The paper mentions 'training and test sets' but does not explicitly provide details about validation dataset splits (e.g., percentages, sample counts, or specific strategies for validation).
Hardware Specification No The information is insufficient. The paper does not provide specific hardware details (such as GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The information is insufficient. The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup Yes All architectural details and hyperparameter ranges are listed in appendix B.