Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Revisiting a Design Choice in Gradient Temporal Difference Learning
Authors: Xiaochi Qian, Shangtong Zhang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 EXPERIMENTS We now empirically compare (A t TD) with a few other TD algorithms with linear function approximation... We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995)... We report the square root of the mean squared projected Bellman error (RMSPBE) at each time step. |
| Researcher Affiliation | Academia | Xiaochi Qian Department of Computer Science University of Oxford EMAIL Shangtong Zhang Department of Computer Science University of Virginia EMAIL |
| Pseudocode | No | The algorithms (1), (3), (GTD), and (A t TD) are presented as mathematical equations (e.g., "wt+1 .= wt + αt Rt+1 + γx t+1wt x t wt. (1)", "wt+1 .= wt + αtρt+f(t) xt+f(t) γxt+f(t)+1 x t+f(t)ρtδtxt. (A t TD)"). There are no explicit "Algorithm" or "Pseudocode" blocks. |
| Open Source Code | No | We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines, not the authors' own code for A t TD. No other explicit statement or link for their code is provided. |
| Open Datasets | Yes | We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020). |
| Dataset Splits | No | We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020)... For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step)." No explicit dataset split information is provided in the main text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned for running the experiments. The mention of "3 GHz CPU" is within a theoretical argument about memory cost, not the experimental setup. |
| Software Dependencies | No | We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines and doesn't provide specific software versions for the authors' own work or the environment. |
| Experiment Setup | Yes | For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step). |