Convergent Tree Backup and Retrace with Function Approximation
Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. Experimental Results: To validate our theoretical results about instability, we implemented TB(λ), RETRACE(λ) and compared them against their gradient-based counterparts GTB(λ) and GRETRACE(λ) derived in this paper. The first one is the 2-states counterexample that we detailed in the third section and the second is the 7-states versions of Baird s counterexample (Baird et al., 1995). Figures 2 and 3 show the MSBPE (averaged over 20 runs) as a function of the number of iterations. |
| Researcher Affiliation | Collaboration | 1MILA, Université de Montréal 2Facebook AI Research 3MILA, Mc Gill University 4Canadian Institute for Advanced Research (CIFAR). |
| Pseudocode | Yes | Algorithm 1 Gradient Off-policy with eligibility traces |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is publicly available, nor does it provide a link to a repository. |
| Open Datasets | Yes | As in Mahmood et al. (2017), we also consider a policy evaluation task in the Mountain Car domain. ... We chose to describe state-action pairs by a 96-dimensional vector of features derived by tile coding (Sutton & Barto, 1998). |
| Dataset Splits | No | The paper describes experiments in a reinforcement learning environment ("Mountain Car domain") where training occurs over "2000 episodes" rather than using static train/validation/test dataset splits. No specific dataset split information is provided. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We ran each algorithm over all possible combinations of step-size values (αk, ηk) [0.001, 0.005, 0.01, 0.05, 0.1]2 for 2000 episodes and reported their normalized mean squared errors (NMSE). |