A new Q(lambda) with interim forward view and Monte Carlo equivalence
Authors: Rich Sutton, Ashique Rupam Mahmood, Doina Precup, Hado Hasselt
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The main contributions of this paper are 1) the two new off-policy algorithms PTD(λ) and PQ(λ); 2) a new forward view of these algorithms that ensures Monte Carlo equivalence at λ = 1; 3) the notion of an interim forward view and a technique for using it to derive and prove equivalence of backward-view algorithms; and 4) applications of the technique to derive and prove equivalences for PTD(λ) and PQ(λ). |
| Researcher Affiliation | Academia | Richard S. Sutton, A. Rupam Mahmood {SUTTON,ASHIQUE}@CS.UALBERTA.CA Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada Doina Precup DPRECUP@CS.MCGILL.CA School of Computer Science, Mc Gill University, Montr eal, QC H3A 0G4 Canada Hado van Hasselt VANHASSE@CS.UALBERTA.CA Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada |
| Pseudocode | No | The paper describes algorithms using mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on experiments or use datasets, so no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup or hyperparameters. |