PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Authors: Ziwei Guan, Tengyu Xu, Yingbin Liang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments validate the superior performance of PER-ETD and its advantage over ETD. |
| Researcher Affiliation | Academia | Ziwei Guan, Tengyu Xu & Yingbin Liang Department of Electrical and Computer Engineering Ohio State University Columbus, OH 43210, USA |
| Pseudocode | Yes | Algorithm 1 PER-ETD(0), Algorithm 2 PER-ETD(λ) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We consider the BAIRD counter-example. The details of the MDP setting and behavior and target policies could be found in Appendix A.1. The BAIRD counter-example is illustrated in Figure 4, which has 7 states and 2 actions... We choose the target policy as π(0|s) = 0.1 and π(1|s) = 0.9 for all states; and choose the behavior policy as µ(0|s) = 6/7 and µ(1|s) = 1/7 for all states. Moreover, we specify the discount factor γ = 0.99. |
| Dataset Splits | No | The paper describes the MDP environment and policies but does not provide specific training/validation/test dataset splits. For this type of reinforcement learning research, data is generated through interaction with the defined environment rather than from pre-collected fixed splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We adopt a constant learning rate for both PERETD(0) and PER-ETD(λ) and all experiments take an average over 20 random initialization. We set the stepsize η = 2 9 for all algorithms for fair comparison. For PER-ETD(0), we adopt onedimensional features Φ1 = (0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.37) . |