reproducibilityindex.ai

Convergent Tree Backup and Retrace with Function Approximation

Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Experimental Results: To validate our theoretical results about instability, we implemented TB(λ), RETRACE(λ) and compared them against their gradient-based counterparts GTB(λ) and GRETRACE(λ) derived in this paper. The ﬁrst one is the 2-states counterexample that we detailed in the third section and the second is the 7-states versions of Baird s counterexample (Baird et al., 1995). Figures 2 and 3 show the MSBPE (averaged over 20 runs) as a function of the number of iterations.
Researcher Affiliation	Collaboration	1MILA, Université de Montréal 2Facebook AI Research 3MILA, Mc Gill University 4Canadian Institute for Advanced Research (CIFAR).
Pseudocode	Yes	Algorithm 1 Gradient Off-policy with eligibility traces
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is publicly available, nor does it provide a link to a repository.
Open Datasets	Yes	As in Mahmood et al. (2017), we also consider a policy evaluation task in the Mountain Car domain. ... We chose to describe state-action pairs by a 96-dimensional vector of features derived by tile coding (Sutton & Barto, 1998).
Dataset Splits	No	The paper describes experiments in a reinforcement learning environment ("Mountain Car domain") where training occurs over "2000 episodes" rather than using static train/validation/test dataset splits. No specific dataset split information is provided.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	We ran each algorithm over all possible combinations of step-size values (αk, ηk) [0.001, 0.005, 0.01, 0.05, 0.1]2 for 2000 episodes and reported their normalized mean squared errors (NMSE).