Expected Eligibility Traces
Authors: Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa9997-10005
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical Analysis From the insights above, we expect that ET(λ) yields lower prediction errors because it has lower variance and aggregates information across episodes better. In this section we empirically investigate expected traces in several experiments. |
| Researcher Affiliation | Collaboration | Hado van Hasselt1, Sephora Madjiheurem2, Matteo Hessel1 David Silver1, Andr e Barreto1, Diana Borsa1 1 Deep Mind 2 University College London, UK |
| Pseudocode | Yes | Algorithm 1 ET(λ) |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We tested this idea on two canonical Atari games: Pong and Ms. Pac-Man. The results in Figure 6 show that the expected traces helped speed up learning compared to the baseline which uses accumulating traces, for various step sizes. ... Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. 2013. The Arcade Learning Environment: An Evaluation Platform for General Agents. J. Artif. Intell. Res. (JAIR) 47: 253 279. |
| Dataset Splits | No | The paper does not provide specific details regarding dataset splits for training, validation, or testing used in its experiments. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like JAX and Haiku, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We found that being able to track this rather quickly improved performance: the expected trace parameters Θ in the following experiment were updated with a relatively high step size of β = 0.1. ... All results are for λ = 0.95. Further implementation details and hyper-parameters are in the appendix. |