True Online TD(lambda)

Authors: Harm Seijen, Rich Sutton

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical comparisons, our algorithm outperformed TD(λ) in all of its variations. It seems, by adhering more truly to the original goal of TD(λ) matching an intuitively clear forward view even in the online case that we have found a new algorithm that simply improves on classical TD(λ).
Researcher Affiliation Academia Harm van Seijen HARM.VANSEIJEN@UALBERTA.CA Richard S. Sutton SUTTON@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada
Pseudocode Yes Algorithm 1 linear TD(λ) with accummulating traces
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper mentions 'random-walk task' and 'standard mountain car task (Sutton & Barto, 1998)', which are well-known benchmark problems/environments in reinforcement learning. However, it does not provide a specific link, DOI, repository, or explicit citation for a downloadable dataset.
Dataset Splits No The paper describes the setup of the random-walk and mountain car tasks, but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names and versions) needed to replicate the experiment.
Experiment Setup Yes The transition probability in the direction of the terminal state, p, is set to 0.9. Initial θ is 0 and γ = 0.99. For α from 0 to 1.5 with steps of 0.01 and λ from 0 to 0.9 with steps of 0.1 and from 0.9 to 1.0 with steps of 0.025. using 10 tilings of each 10 10 tiles. Results are plotted for λ = 0.9 and α = α0/10, for α0 from 0.2 to 2.0 with steps of 0.2.