reproducibilityindex.ai

Backstepping Temporal Difference Learning

Authors: Han-Dong Lim, Donghwan Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the performance and convergence of the proposed BTD under standard benchmarks to evaluate off-policy TD-learning algorithms, including Baird environment (Baird, 1995), Random Walk (Sutton et al., 2009) with different features, and Boyan chain (Boyan, 2002). The details about the environments are given in Appendix Section 7.7. From the experiments, we see how BTD behaves under different coefﬁcients ∈ {−0.5, −0.25, 0, 0.25, 0.5}. We measure the Root Mean-Squared Projected Bellman Error (RMSPBE) as the performance metric, and every results are averaged over 100 runs.
Researcher Affiliation	Academia	Han-Dong Lim Department of Electrical Engineering KAIST, Daejeon, 34141, South Korea limaries30@kaist.ac.kr Donghwan Lee Department of Electrical Engineering KAIST, Daejeon, 34141, South Korea donghwan@kaist.ac.kr
Pseudocode	Yes	With Algorithm 1 in Appendix, k ! as k ! 1 with probability one, where is the ﬁxed point of (6). Consider Algorithm 2 in Appendix. Algorithm 5 in Appendix.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We verify the performance and convergence of the proposed BTD under standard benchmarks to evaluate off-policy TD-learning algorithms, including Baird environment (Baird, 1995), Random Walk (Sutton et al., 2009) with different features, and Boyan chain (Boyan, 2002).
Dataset Splits	No	The paper mentions environments and benchmarks but does not specify training, validation, or test dataset splits, percentages, or methodology for partitioning the data.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers for replication.
Experiment Setup	Yes	From the experiments, we see how BTD behaves under different coefﬁcients ∈ {−0.5, −0.25, 0, 0.25, 0.5}. We measure the Root Mean-Squared Projected Bellman Error (RMSPBE) as the performance metric, and every results are averaged over 100 runs. From Table 1, Backstepping TD, step-size = 0.01