Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity
Authors: Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of O(ϵ 2). We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings. |
| Researcher Affiliation | Academia | Shaocong Ma, Ziyi Chen & Yi Zhou Department of ECE University of Utah Salt Lake City, UT 84112 EMAIL Shaofeng Zou Department of EE University at Buffalo Buffalo, NY 14260 EMAIL |
| Pseudocode | Yes | Algorithm 1: Variance-Reduced Greedy-GQ |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper uses the Garnet problem and Frozen Lake game, which are well-known reinforcement learning environments. However, it describes generating features and setting up parameters for these problems, rather than providing a link or specific citation to a publicly available dataset used for training. For example, it states: "We set n S = 5, n A = 3, b = 2, d = 4 and generate the features Φ Rn S d via the uniform distribution on [0, 1]" for Garnet, and "We generate a Gaussian feature matrix with dimension 8" for Frozen Lake, indicating the data was generated specifically for their experiments without providing public access to their generated data. |
| Dataset Splits | No | The paper does not explicitly provide details about a validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library names with version numbers). |
| Experiment Setup | Yes | For the Garnet problem, we set n S = 5, n A = 3, b = 2, d = 4... The discount factor is set to be γ = 0.95. ... We set the default learning rates as ηθ = 0.02 and ηω = 0.01 for both VR-Greedy-GQ and Greedy-GQ algorithm. For VR-Greedy-GQ, we set the default batch size as M = 3000. ... We set the learning rates as ηθ = 0.2 and ηω = 0.1 for both algorithms and set the batch size as M = 3000 for the VR-Greedy-GQ. We run 200k iterations for each of the 10 trajectories. |