reproducibilityindex.ai

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Authors: Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3.1 Numerical Illustration We illustrate the importance of the ETD(0, β) bias bound in a numerical example.4.1 Numerical Illustration We revisit the 2-state MDP described in Section 3.1, with γ = 0.9, ε = 0.2 and p = 0.95. For these parameter settings, the error of standard TD is 42.55 (p was chosen to be close to a point of inﬁnite bias for these parameters). In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs.
Researcher Affiliation	Collaboration	Assaf Hallak Technion Israel ifogph@gmail.com Aviv Tamar UC Berkeley USA avivt@berkeley.edu Remi Munos Google Deep Mind UK munos@google.com Shie Mannor Technion Israel shie@ee.technion.ac.il
Pseudocode	No	The paper provides mathematical equations describing the algorithm (e.g., equation (1)), but it does not present them in a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not provide any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper uses a '2-state MDP example of Kolter (2011)' and defines its parameters (P, gamma, V, Phi) within the text, but does not provide a specific link, DOI, or formal citation for a publicly available dataset file.
Dataset Splits	No	The paper describes numerical simulations for an MDP and mentions running the algorithm for a number of iterations and averaging runs, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the numerical illustrations or experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or tools used in its work.
Experiment Setup	Yes	In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs.