Gradient Temporal Difference with Momentum: Stability and Convergence

Authors: Rohan Deb, Shalabh Bhatnagar6488-6496

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate these algorithms on standard RL problems and report improvement in performance over the vanilla algorithms. We evaluate the momentum based GTD algorithms defined in section to four standard problems of policy evaluation in reinforcement learning namely, Boyan Chain (Boyan 1999), 5-State random walk (Sutton et al. 2009), 19-State Random Walk (Sutton and Barto 2018) and Random MDP (Sutton et al. 2009). See Appendix A4 in (Deb and Bhatnagar 2021) for a detailed description of the MDP settings and (Dann, Neumann, and Peters 2014) for details on implementation. We run the three algorithms, GTD, GTD2 and TDC along with their heavy ball momentum variants in One-TS and Three-TS settings and compare the RMSPBE (Root of MSPBE) across episodes. Figure-1 to Figure-4 plot these results.
Researcher Affiliation Academia Rohan Deb, Shalabh Bhatnagar Department of Computer Science and Automation, Indian Institute of Science, Bangalore rohandeb@iisc.ac.in, shalabh@iisc.ac.in
Pseudocode No The paper provides mathematical equations for the algorithms (e.g., (11)-(16)) but does not include a clearly labeled pseudocode block or algorithm box.
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in the paper was found.
Open Datasets Yes We evaluate the momentum based GTD algorithms defined in section to four standard problems of policy evaluation in reinforcement learning namely, Boyan Chain (Boyan 1999), 5-State random walk (Sutton et al. 2009), 19-State Random Walk (Sutton and Barto 2018) and Random MDP (Sutton et al. 2009).
Dataset Splits No No specific dataset split information (percentages, sample counts, or explicit validation set details) was provided for reproduction. The paper mentions 'averaged over 100 independent runs' but does not specify how data was partitioned into training/validation sets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) were mentioned that would be needed to replicate the experiment.
Experiment Setup Yes Table 1: Choice of step-size parameters