reproducibilityindex.ai

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Authors: Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the ﬁnite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of O(ϵ 2). We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical ﬁndings.
Researcher Affiliation	Academia	Shaocong Ma, Ziyi Chen & Yi Zhou Department of ECE University of Utah Salt Lake City, UT 84112 {s.ma,u1276972,yi.zhou}@utah.edu Shaofeng Zou Department of EE University at Buffalo Buffalo, NY 14260 szou3@buffalo.edu
Pseudocode	Yes	Algorithm 1: Variance-Reduced Greedy-GQ
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper uses the Garnet problem and Frozen Lake game, which are well-known reinforcement learning environments. However, it describes generating features and setting up parameters for these problems, rather than providing a link or specific citation to a publicly available dataset used for training. For example, it states: "We set n S = 5, n A = 3, b = 2, d = 4 and generate the features Φ Rn S d via the uniform distribution on [0, 1]" for Garnet, and "We generate a Gaussian feature matrix with dimension 8" for Frozen Lake, indicating the data was generated specifically for their experiments without providing public access to their generated data.
Dataset Splits	No	The paper does not explicitly provide details about a validation dataset split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library names with version numbers).
Experiment Setup	Yes	For the Garnet problem, we set n S = 5, n A = 3, b = 2, d = 4... The discount factor is set to be γ = 0.95. ... We set the default learning rates as ηθ = 0.02 and ηω = 0.01 for both VR-Greedy-GQ and Greedy-GQ algorithm. For VR-Greedy-GQ, we set the default batch size as M = 3000. ... We set the learning rates as ηθ = 0.2 and ηω = 0.1 for both algorithms and set the batch size as M = 3000 for the VR-Greedy-GQ. We run 200k iterations for each of the 10 trajectories.