On Convergence of Gradient Expected Sarsa(λ)

Authors: Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan10621-10629

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our experiments verify the effectiveness of our GES(λ). For the details of proof, please refer to https: //arxiv.org/pdf/2012.07199.pdf.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University, China. 2School of Electrical and Electronic Engineering, Nanyang Technological University,Singapore.
Pseudocode Yes Algorithm 1 Gradient Expected Sarsa(λ) (GES(λ))
Open Source Code No The paper does not provide any links or explicit statements about the availability of open-source code for the methodology.
Open Datasets Yes In this section, we test the capacity of GES(λ) for off-policy evaluation in three typical domains: Mountain Car, Baird Star (Baird 1995), Two-State MDP (Touati et al. 2018).
Dataset Splits No The paper does not provide specific details on training/test/validation dataset splits, nor does it explicitly mention a validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using an 'open tile coding software' but does not specify its version number. No other software dependencies with specific version numbers are provided.
Experiment Setup Yes As suggested by Sutton and Barto (2018), we set all the initial parameters to be 0, which is optimistic about causing extensive exploration... We set λ = 0.99, γ = 0.99 in all the experiments. The MSPBE/MSE distribution is computed over the combination of step-size, (αt, βtαt) [0.1 2j|j = 10, 9, , 1, 0]2.