Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Authors: Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3.1 Numerical Illustration We illustrate the importance of the ETD(0, β) bias bound in a numerical example.4.1 Numerical Illustration We revisit the 2-state MDP described in Section 3.1, with γ = 0.9, ε = 0.2 and p = 0.95. For these parameter settings, the error of standard TD is 42.55 (p was chosen to be close to a point of infinite bias for these parameters). In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs.
Researcher Affiliation Collaboration Assaf Hallak Technion Israel ifogph@gmail.com Aviv Tamar UC Berkeley USA avivt@berkeley.edu Remi Munos Google Deep Mind UK munos@google.com Shie Mannor Technion Israel shie@ee.technion.ac.il
Pseudocode No The paper provides mathematical equations describing the algorithm (e.g., equation (1)), but it does not present them in a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets No The paper uses a '2-state MDP example of Kolter (2011)' and defines its parameters (P, gamma, V, Phi) within the text, but does not provide a specific link, DOI, or formal citation for a publicly available dataset file.
Dataset Splits No The paper describes numerical simulations for an MDP and mentions running the algorithm for a number of iterations and averaging runs, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the numerical illustrations or experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or tools used in its work.
Experiment Setup Yes In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs.