Interference and Generalization in Temporal Difference Learning
Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3. Empirical Setup For the generalization experiments of Section 4 we loosely follow the setup of Zhang et al. (2018a): we train RL agents in environments where the initial state is induced by a single random seed, allowing us to have proper training and test sets in the form of mutually exclusive seeds. In particular, to allow for closer comparisons between RL and SL, we compare classifiers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009) to agents that learn to progressively explore a masked image (from those datasets) while attempting to classify it. |
| Researcher Affiliation | Collaboration | 1Mila, Mc Gill University 2Work partly done while the author was an intern at Deepmind 3Deepmind. |
| Pseudocode | No | The information is insufficient. The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code is available in the supplementary materials. |
| Open Datasets | Yes | In particular, to allow for closer comparisons between RL and SL, we compare classifiers trained on SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky, 2009)... As much of the existing deep learning literature on generalization focuses on classifiers, but estimating value functions is arguably closer to regression, we include two regression experiments using SARCOS (Vijayakumar & Schaal, 2000) and the California Housing dataset (Pace & Barry, 1997). Finally, for the interactive environment experiments of Section 5, we investigate some metrics on the popular Atari environment (Bellemare et al., 2013)... |
| Dataset Splits | No | The information is insufficient. The paper mentions 'training and test sets' but does not explicitly provide details about validation dataset splits (e.g., percentages, sample counts, or specific strategies for validation). |
| Hardware Specification | No | The information is insufficient. The paper does not provide specific hardware details (such as GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The information is insufficient. The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | All architectural details and hyperparameter ranges are listed in appendix B. |