reproducibilityindex.ai

Non-delusional Q-learning and value-iteration

Authors: Tyler Lu, Dale Schuurmans, Craig Boutilier

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2: Planning and learning in a grid world with random feature representations. (Left: 4 4 grid using 4 features; Right: 5 5 grid using 5 features.) Here iterations means a full sweep over state-action pairs, except for Q-learning and PCQL, where an iteration is an episode of length 3/(1 γ) = 60 using εGreedy exploration with ε = 0.7. Dark lines: estimated maximum achievable expected value. Light lines: actual expected value achieved by greedy policy.
Researcher Affiliation	Industry	Tyler Lu Google AI tylerlu@google.com Dale Schuurmans Google AI schuurmans@google.com Craig Boutilier Google AI cboutilier@google.com
Pseudocode	Yes	Algorithm 1 Policy-Class Value Iteration (PCVI); Algorithm 2 Policy-Class Q-learning (PCQL)
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for their proposed methods (PCVI, PCQL) is open-source or publicly available.
Open Datasets	No	The paper describes experiments on a 'simple deterministic grid world' and 'random feature representations' which appear to be custom-generated for the experiments. It does not provide access information (link, DOI, citation) to a publicly available or open dataset.
Dataset Splits	No	The paper mentions 'εGreedy exploration with ε = 0.7' for training and 'a full sweep over state-action pairs' for iterations, but does not specify dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of their algorithms or experiments.
Experiment Setup	Yes	Figure 2 mentions 'an iteration is an episode of length 3/(1 γ) = 60 using εGreedy exploration with ε = 0.7'. The paper also states 'linear approximator' and 'random feature representations' were used for the grid world experiments.