Gradient Temporal Difference with Momentum: Stability and Convergence
Authors: Rohan Deb, Shalabh Bhatnagar6488-6496
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate these algorithms on standard RL problems and report improvement in performance over the vanilla algorithms. We evaluate the momentum based GTD algorithms defined in section to four standard problems of policy evaluation in reinforcement learning namely, Boyan Chain (Boyan 1999), 5-State random walk (Sutton et al. 2009), 19-State Random Walk (Sutton and Barto 2018) and Random MDP (Sutton et al. 2009). See Appendix A4 in (Deb and Bhatnagar 2021) for a detailed description of the MDP settings and (Dann, Neumann, and Peters 2014) for details on implementation. We run the three algorithms, GTD, GTD2 and TDC along with their heavy ball momentum variants in One-TS and Three-TS settings and compare the RMSPBE (Root of MSPBE) across episodes. Figure-1 to Figure-4 plot these results. |
| Researcher Affiliation | Academia | Rohan Deb, Shalabh Bhatnagar Department of Computer Science and Automation, Indian Institute of Science, Bangalore rohandeb@iisc.ac.in, shalabh@iisc.ac.in |
| Pseudocode | No | The paper provides mathematical equations for the algorithms (e.g., (11)-(16)) but does not include a clearly labeled pseudocode block or algorithm box. |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the methodology described in the paper was found. |
| Open Datasets | Yes | We evaluate the momentum based GTD algorithms defined in section to four standard problems of policy evaluation in reinforcement learning namely, Boyan Chain (Boyan 1999), 5-State random walk (Sutton et al. 2009), 19-State Random Walk (Sutton and Barto 2018) and Random MDP (Sutton et al. 2009). |
| Dataset Splits | No | No specific dataset split information (percentages, sample counts, or explicit validation set details) was provided for reproduction. The paper mentions 'averaged over 100 independent runs' but does not specify how data was partitioned into training/validation sets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) were mentioned that would be needed to replicate the experiment. |
| Experiment Setup | Yes | Table 1: Choice of step-size parameters |