True Online TD(lambda)
Authors: Harm Seijen, Rich Sutton
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical comparisons, our algorithm outperformed TD(λ) in all of its variations. It seems, by adhering more truly to the original goal of TD(λ) matching an intuitively clear forward view even in the online case that we have found a new algorithm that simply improves on classical TD(λ). |
| Researcher Affiliation | Academia | Harm van Seijen HARM.VANSEIJEN@UALBERTA.CA Richard S. Sutton SUTTON@CS.UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada |
| Pseudocode | Yes | Algorithm 1 linear TD(λ) with accummulating traces |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper mentions 'random-walk task' and 'standard mountain car task (Sutton & Barto, 1998)', which are well-known benchmark problems/environments in reinforcement learning. However, it does not provide a specific link, DOI, repository, or explicit citation for a downloadable dataset. |
| Dataset Splits | No | The paper describes the setup of the random-walk and mountain car tasks, but does not provide specific details on training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names and versions) needed to replicate the experiment. |
| Experiment Setup | Yes | The transition probability in the direction of the terminal state, p, is set to 0.9. Initial θ is 0 and γ = 0.99. For α from 0 to 1.5 with steps of 0.01 and λ from 0 to 0.9 with steps of 0.1 and from 0.9 to 1.0 with steps of 0.025. using 10 tilings of each 10 10 tiles. Results are plotted for λ = 0.9 and α = α0/10, for α0 from 0.2 to 2.0 with steps of 0.2. |