reproducibilityindex.ai

Interference and Generalization in Temporal Difference Learning

Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3. Empirical Setup For the generalization experiments of Section 4 we loosely follow the setup of Zhang et al. (2018a): we train RL agents in environments where the initial state is induced by a single random seed, allowing us to have proper training and test sets in the form of mutually exclusive seeds. In particular, to allow for closer comparisons between RL and SL, we compare classiﬁers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009) to agents that learn to progressively explore a masked image (from those datasets) while attempting to classify it.
Researcher Affiliation	Collaboration	1Mila, Mc Gill University 2Work partly done while the author was an intern at Deepmind 3Deepmind.
Pseudocode	No	The information is insufficient. The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code is available in the supplementary materials.
Open Datasets	Yes	In particular, to allow for closer comparisons between RL and SL, we compare classiﬁers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009)... As much of the existing deep learning literature on generalization focuses on classiﬁers, but estimating value functions is arguably closer to regression, we include two regression experiments using SARCOS (Vijayakumar & Schaal, 2000) and the California Housing dataset (Pace & Barry, 1997). Finally, for the interactive environment experiments of Section 5, we investigate some metrics on the popular Atari environment (Bellemare et al., 2013)...
Dataset Splits	No	The information is insufficient. The paper mentions 'training and test sets' but does not explicitly provide details about validation dataset splits (e.g., percentages, sample counts, or specific strategies for validation).
Hardware Specification	No	The information is insufficient. The paper does not provide specific hardware details (such as GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The information is insufficient. The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	All architectural details and hyperparameter ranges are listed in appendix B.