Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
Authors: Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3.1 Numerical Illustration We illustrate the importance of the ETD(0, β) bias bound in a numerical example.4.1 Numerical Illustration We revisit the 2-state MDP described in Section 3.1, with γ = 0.9, ε = 0.2 and p = 0.95. For these parameter settings, the error of standard TD is 42.55 (p was chosen to be close to a point of infinite bias for these parameters). In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs. |
| Researcher Affiliation | Collaboration | Assaf Hallak Technion Israel ifogph@gmail.com Aviv Tamar UC Berkeley USA avivt@berkeley.edu Remi Munos Google Deep Mind UK munos@google.com Shie Mannor Technion Israel shie@ee.technion.ac.il |
| Pseudocode | No | The paper provides mathematical equations describing the algorithm (e.g., equation (1)), but it does not present them in a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses a '2-state MDP example of Kolter (2011)' and defines its parameters (P, gamma, V, Phi) within the text, but does not provide a specific link, DOI, or formal citation for a publicly available dataset file. |
| Dataset Splits | No | The paper describes numerical simulations for an MDP and mentions running the algorithm for a number of iterations and averaging runs, but it does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the numerical illustrations or experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or tools used in its work. |
| Experiment Setup | Yes | In Figure 2 we plot the mean-squared error Φ θ V π dπ, where θ was obtained by running ETD(0, β) with a step size α = 0.001 for 10, 000 iterations, and averaging the results over 10, 000 different runs. |