Convergent Tree Backup and Retrace with Function Approximation

Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7. Experimental Results: To validate our theoretical results about instability, we implemented TB(λ), RETRACE(λ) and compared them against their gradient-based counterparts GTB(λ) and GRETRACE(λ) derived in this paper. The first one is the 2-states counterexample that we detailed in the third section and the second is the 7-states versions of Baird s counterexample (Baird et al., 1995). Figures 2 and 3 show the MSBPE (averaged over 20 runs) as a function of the number of iterations.
Researcher Affiliation Collaboration 1MILA, Université de Montréal 2Facebook AI Research 3MILA, Mc Gill University 4Canadian Institute for Advanced Research (CIFAR).
Pseudocode Yes Algorithm 1 Gradient Off-policy with eligibility traces
Open Source Code No The paper does not explicitly state that the source code for their methodology is publicly available, nor does it provide a link to a repository.
Open Datasets Yes As in Mahmood et al. (2017), we also consider a policy evaluation task in the Mountain Car domain. ... We chose to describe state-action pairs by a 96-dimensional vector of features derived by tile coding (Sutton & Barto, 1998).
Dataset Splits No The paper describes experiments in a reinforcement learning environment ("Mountain Car domain") where training occurs over "2000 episodes" rather than using static train/validation/test dataset splits. No specific dataset split information is provided.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We ran each algorithm over all possible combinations of step-size values (αk, ηk) [0.001, 0.005, 0.01, 0.05, 0.1]2 for 2000 episodes and reported their normalized mean squared errors (NMSE).