Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting a Design Choice in Gradient Temporal Difference Learning

Authors: Xiaochi Qian, Shangtong Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 EXPERIMENTS We now empirically compare (A t TD) with a few other TD algorithms with linear function approximation... We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995)... We report the square root of the mean squared projected Bellman error (RMSPBE) at each time step.
Researcher Affiliation	Academia	Xiaochi Qian Department of Computer Science University of Oxford EMAIL Shangtong Zhang Department of Computer Science University of Virginia EMAIL
Pseudocode	No	The algorithms (1), (3), (GTD), and (A t TD) are presented as mathematical equations (e.g., "wt+1 .= wt + αt Rt+1 + γx t+1wt x t wt. (1)", "wt+1 .= wt + αtρt+f(t) xt+f(t) γxt+f(t)+1 x t+f(t)ρtδtxt. (A t TD)"). There are no explicit "Algorithm" or "Pseudocode" blocks.
Open Source Code	No	We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines, not the authors' own code for A t TD. No other explicit statement or link for their code is provided.
Open Datasets	Yes	We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020).
Dataset Splits	No	We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020)... For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step)." No explicit dataset split information is provided in the main text.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned for running the experiments. The mention of "3 GHz CPU" is within a theoretical argument about memory cost, not the experimental setup.
Software Dependencies	No	We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines and doesn't provide specific software versions for the authors' own work or the environment.
Experiment Setup	Yes	For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step).