reproducibilityindex.ai

Proximal Gradient Temporal Difference Learning Algorithms

Authors: Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods. 4 Empirical Evaluation
Researcher Affiliation	Collaboration	Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik UMass Amherst, U. of Rochester, Adobe & INRIA Lille, UMass Amherst, IBM Research
Pseudocode	Yes	Algorithm 1 GTD2-MP
Open Source Code	No	The paper does not contain any statement about releasing their source code, nor does it provide a link to a code repository.
Open Datasets	Yes	The Baird example [Baird, 1995] is a well-known example to test the performance of off-policy convergent algorithms.
Dataset Splits	No	Figure 1 shows the MSPBE curve of GTD2, GTD2-MP of 8000 steps averaged over 200 runs. The paper mentions these details of the runs, but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instance types).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) that would be necessary for reproduction.
Experiment Setup	Yes	Constant stepsize = 0.005 for GTD2 and = 0.004 for GTD2-MP, which are chosen via comparison studies as in [Dann et al., 2014]. The result is averaged over 200 runs, and = 0.001 for both GTD2 and GTD2-MP is chosen via comparison studies for each algorithm.