On Convergence of Gradient Expected Sarsa(λ)
Authors: Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan10621-10629
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our experiments verify the effectiveness of our GES(λ). For the details of proof, please refer to https: //arxiv.org/pdf/2012.07199.pdf. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University, China. 2School of Electrical and Electronic Engineering, Nanyang Technological University,Singapore. |
| Pseudocode | Yes | Algorithm 1 Gradient Expected Sarsa(λ) (GES(λ)) |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the methodology. |
| Open Datasets | Yes | In this section, we test the capacity of GES(λ) for off-policy evaluation in three typical domains: Mountain Car, Baird Star (Baird 1995), Two-State MDP (Touati et al. 2018). |
| Dataset Splits | No | The paper does not provide specific details on training/test/validation dataset splits, nor does it explicitly mention a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using an 'open tile coding software' but does not specify its version number. No other software dependencies with specific version numbers are provided. |
| Experiment Setup | Yes | As suggested by Sutton and Barto (2018), we set all the initial parameters to be 0, which is optimistic about causing extensive exploration... We set λ = 0.99, γ = 0.99 in all the experiments. The MSPBE/MSE distribution is computed over the combination of step-size, (αt, βtαt) [0.1 2j|j = 10, 9, , 1, 0]2. |