A new Q(lambda) with interim forward view and Monte Carlo equivalence

Authors: Rich Sutton, Ashique Rupam Mahmood, Doina Precup, Hado Hasselt

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The main contributions of this paper are 1) the two new off-policy algorithms PTD(λ) and PQ(λ); 2) a new forward view of these algorithms that ensures Monte Carlo equivalence at λ = 1; 3) the notion of an interim forward view and a technique for using it to derive and prove equivalence of backward-view algorithms; and 4) applications of the technique to derive and prove equivalences for PTD(λ) and PQ(λ).
Researcher Affiliation Academia Richard S. Sutton, A. Rupam Mahmood {SUTTON,ASHIQUE}@CS.UALBERTA.CA Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada Doina Precup DPRECUP@CS.MCGILL.CA School of Computer Science, Mc Gill University, Montr eal, QC H3A 0G4 Canada Hado van Hasselt VANHASSE@CS.UALBERTA.CA Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada
Pseudocode No The paper describes algorithms using mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statements or links regarding the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not report on experiments or use datasets, so no information about publicly available datasets is provided.
Dataset Splits No The paper is theoretical and does not describe experiments or dataset splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not include details about an experimental setup or hyperparameters.