reproducibilityindex.ai

True Online TD(lambda)

Authors: Harm Seijen, Rich Sutton

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical comparisons, our algorithm outperformed TD(λ) in all of its variations. It seems, by adhering more truly to the original goal of TD(λ) matching an intuitively clear forward view even in the online case that we have found a new algorithm that simply improves on classical TD(λ).
Researcher Affiliation	Academia	Harm van Seijen HARM.VANSEIJEN@UALBERTA.CA Richard S. Sutton SUTTON@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada
Pseudocode	Yes	Algorithm 1 linear TD(λ) with accummulating traces
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper mentions 'random-walk task' and 'standard mountain car task (Sutton & Barto, 1998)', which are well-known benchmark problems/environments in reinforcement learning. However, it does not provide a specific link, DOI, repository, or explicit citation for a downloadable dataset.
Dataset Splits	No	The paper describes the setup of the random-walk and mountain car tasks, but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names and versions) needed to replicate the experiment.
Experiment Setup	Yes	The transition probability in the direction of the terminal state, p, is set to 0.9. Initial θ is 0 and γ = 0.99. For α from 0 to 1.5 with steps of 0.01 and λ from 0 to 0.9 with steps of 0.1 and from 0.9 to 1.0 with steps of 0.025. using 10 tilings of each 10 10 tiles. Results are plotted for λ = 0.9 and α = α0/10, for α0 from 0.2 to 2.0 with steps of 0.2.