Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A new Q(lambda) with interim forward view and Monte Carlo equivalence
Authors: Rich Sutton, Ashique Rupam Mahmood, Doina Precup, Hado Hasselt
ICML 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The main contributions of this paper are 1) the two new off-policy algorithms PTD(λ) and PQ(λ); 2) a new forward view of these algorithms that ensures Monte Carlo equivalence at λ = 1; 3) the notion of an interim forward view and a technique for using it to derive and prove equivalence of backward-view algorithms; and 4) applications of the technique to derive and prove equivalences for PTD(λ) and PQ(λ). |
| Researcher Affiliation | Academia | Richard S. Sutton, A. Rupam Mahmood EMAIL Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada Doina Precup EMAIL School of Computer Science, Mc Gill University, Montr eal, QC H3A 0G4 Canada Hado van Hasselt EMAIL Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, AB T6G 2E8 Canada |
| Pseudocode | No | The paper describes algorithms using mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on experiments or use datasets, so no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup or hyperparameters. |