Proximal Gradient Temporal Difference Learning Algorithms
Authors: Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods. 4 Empirical Evaluation |
| Researcher Affiliation | Collaboration | Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik UMass Amherst, U. of Rochester, Adobe & INRIA Lille, UMass Amherst, IBM Research |
| Pseudocode | Yes | Algorithm 1 GTD2-MP |
| Open Source Code | No | The paper does not contain any statement about releasing their source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The Baird example [Baird, 1995] is a well-known example to test the performance of off-policy convergent algorithms. |
| Dataset Splits | No | Figure 1 shows the MSPBE curve of GTD2, GTD2-MP of 8000 steps averaged over 200 runs. The paper mentions these details of the runs, but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) that would be necessary for reproduction. |
| Experiment Setup | Yes | Constant stepsize = 0.005 for GTD2 and = 0.004 for GTD2-MP, which are chosen via comparison studies as in [Dann et al., 2014]. The result is averaged over 200 runs, and = 0.001 for both GTD2 and GTD2-MP is chosen via comparison studies for each algorithm. |