reproducibilityindex.ai

Target-Based Temporal-Difference Learning

Authors: Donghwan Lee, Niao He

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition, we provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning.
Researcher Affiliation	Academia	1Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, USA 2Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana Champaign, USA.
Pseudocode	Yes	Algorithm 1 Standard TD-Learning; Algorithm 2 Averaging TD-Learning (A-TD); Algorithm 3 Double TD-Learning (D-TD); Algorithm 4 Periodic TD-Learning (P-TD).
Open Source Code	No	The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes a simulated MDP environment with specific parameters ('γ = 0.9, \|S\| = 10, ... and rπ(s) U[0, 20]') and a feature vector based on a cited work (Geramifard et al., 2013), but it does not provide concrete access information (e.g., a link, DOI, or specific repository name) for a publicly available or open dataset that was used.
Dataset Splits	No	The paper conducts simulations within a defined MDP environment and evaluates error evolution over iterations, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined dataset partitions.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers, such as programming languages, libraries, or specialized solvers used in the experiments.
Experiment Setup	Yes	standard TD-learning ... with the step-size, αk = 1000/(k + 10000) and the proposed A-TD ... with the αk = 1000/(k + 10000) and δ = 0.9. ... we employ the adaptive step-size rule, βk,t = (10000 (0.997)k)/(10000 + t) with Lk = 40 for P-TD, and the corresponding simulation results are given in Figure 3, where P-TD outperforms the standard TD with the step-size, αk = 10000/(k+10000), best tuned for comparison.