reproducibilityindex.ai

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Authors: Ziwei Guan, Tengyu Xu, Yingbin Liang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments validate the superior performance of PER-ETD and its advantage over ETD.
Researcher Affiliation	Academia	Ziwei Guan, Tengyu Xu & Yingbin Liang Department of Electrical and Computer Engineering Ohio State University Columbus, OH 43210, USA
Pseudocode	Yes	Algorithm 1 PER-ETD(0), Algorithm 2 PER-ETD(λ)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	We consider the BAIRD counter-example. The details of the MDP setting and behavior and target policies could be found in Appendix A.1. The BAIRD counter-example is illustrated in Figure 4, which has 7 states and 2 actions... We choose the target policy as π(0\|s) = 0.1 and π(1\|s) = 0.9 for all states; and choose the behavior policy as µ(0\|s) = 6/7 and µ(1\|s) = 1/7 for all states. Moreover, we specify the discount factor γ = 0.99.
Dataset Splits	No	The paper describes the MDP environment and policies but does not provide specific training/validation/test dataset splits. For this type of reinforcement learning research, data is generated through interaction with the defined environment rather than from pre-collected fixed splits.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We adopt a constant learning rate for both PERETD(0) and PER-ETD(λ) and all experiments take an average over 20 random initialization. We set the stepsize η = 2 9 for all algorithms for fair comparison. For PER-ETD(0), we adopt onedimensional features Φ1 = (0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.37) .