Target-Based Temporal-Difference Learning
Authors: Donghwan Lee, Niao He
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. |
| Researcher Affiliation | Academia | 1Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, USA 2Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana Champaign, USA. |
| Pseudocode | Yes | Algorithm 1 Standard TD-Learning; Algorithm 2 Averaging TD-Learning (A-TD); Algorithm 3 Double TD-Learning (D-TD); Algorithm 4 Periodic TD-Learning (P-TD). |
| Open Source Code | No | The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes a simulated MDP environment with specific parameters ('γ = 0.9, |S| = 10, ... and rπ(s) U[0, 20]') and a feature vector based on a cited work (Geramifard et al., 2013), but it does not provide concrete access information (e.g., a link, DOI, or specific repository name) for a publicly available or open dataset that was used. |
| Dataset Splits | No | The paper conducts simulations within a defined MDP environment and evaluates error evolution over iterations, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined dataset partitions. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers, such as programming languages, libraries, or specialized solvers used in the experiments. |
| Experiment Setup | Yes | standard TD-learning ... with the step-size, αk = 1000/(k + 10000) and the proposed A-TD ... with the αk = 1000/(k + 10000) and δ = 0.9. ... we employ the adaptive step-size rule, βk,t = (10000 (0.997)k)/(10000 + t) with Lk = 40 for P-TD, and the corresponding simulation results are given in Figure 3, where P-TD outperforms the standard TD with the step-size, αk = 10000/(k+10000), best tuned for comparison. |