reproducibilityindex.ai

Source Traces for Temporal Difference Learning

Authors: Silviu Pitis

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2 (left), which reﬂects the 3D Gridworld, plots learning curves after 100,000 steps for the source learning algorithm given by equation 4 for S1 (TD(0)) through S4 and S. A similar pattern appears for Sλ with increasing λ. In each case, v0 was initialized to 0, and vn v was averaged across the MRPs (v was computed by matrix inversion). The curves in ﬁgure 2 (left) are not representative of all learning rates. Figure 2 (center) shows the ﬁnal error achieved by TD(0), TD(λ) at the best λ (tested in 0.1 increments), S4 and S at various ﬁxed α. All experiments, unless otherwise noted, reﬂect average results on 30 Random MRP or 3D Gridworld environments.
Researcher Affiliation	Academia	Silviu Pitis Georgia Institute of Technology Atlanta, GA, USA 30332 spitis@gatech.edu
Pseudocode	Yes	Algorithm 1 Tabular TD learning with source traces
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes generating environments ("Random MRP with 100 states", "1000-state 3D Gridworld") for experiments rather than using a pre-existing, publicly available dataset with concrete access information. No links or citations to specific public datasets are provided.
Dataset Splits	No	The paper describes using multiple generated environments (30 Random MRP or 3D Gridworld environments) and averaging results, but does not specify a train/validation/test split for a dataset, nor does it refer to predefined splits with citations or provide cross-validation details. The mention of 'validation' in the paper refers to the process of evaluating the learned value function against the true value (vn v).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, cloud resources).
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We tested the following set of annealing schedules, adapted from Geramifard et al. 2007: αn = α0(N0 + 1)/(N0 + n1.1) for α {5e-1, 2e-1, 1e-1, 5e-2, 2e-2, 1e-2, 5e-3} and N0 {0, 1e2, 1e4, 1e6}. In each case, v0 was initialized to 0. β is the learning rate used in the stochastic approximation of S on line 13, and may be ﬁxed or annealed according to some schedule. Starting at λ = 0.5, λ was annealed linearly to 1 over the ﬁrst 25,000 steps. The replay memory had inﬁnite capacity, and was invoked on every step to replay 3 past steps.