reproducibilityindex.ai

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

Authors: Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design experiments to answer the following questions: (a) Can GEM approximate the emphasis as promised? (b) Can the GEM-learned emphasis boost performance compared with the followon trace? All curves are averaged over 30 independent runs. Shadowed regions indicate one standard derivation. All the implementations are made publicly available for future research.
Researcher Affiliation	Collaboration	1University of Oxford 2Auburn University 3Huawei Technologies.
Pseudocode	Yes	Algorithm 1 COF-PAC
Open Source Code	Yes	All the implementations are made publicly available for future research.7 https://github.com/Shangtong Zhang/Deep RL
Open Datasets	Yes	We benchmarked COF-PAC, ACE and TD3 (Fujimoto et al., 2018) in Reacher-v2 from Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper mentions training and evaluation but does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The acknowledgments mention a 'generous equipment grant from NVIDIA', indicating NVIDIA GPUs were used, but no specific GPU model, CPU, or other detailed hardware specifications for the experiments are provided.
Software Dependencies	No	The paper mentions implementation details, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For GEM, we consider a ﬁxed learning rate α and tune it from {0.1 2^1, . . . , 0.1 2^6}. [...] For GEM-ETD(0), we set α1 = 0.025 and tune α2 in the same range as α. [...] Our implementation is based on Zhang et al. (2019), and we inherited their hyperparameters.