Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

Authors: Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design experiments to answer the following questions: (a) Can GEM approximate the emphasis as promised? (b) Can the GEM-learned emphasis boost performance compared with the followon trace? All curves are averaged over 30 independent runs. Shadowed regions indicate one standard derivation. All the implementations are made publicly available for future research.
Researcher Affiliation Collaboration 1University of Oxford 2Auburn University 3Huawei Technologies.
Pseudocode Yes Algorithm 1 COF-PAC
Open Source Code Yes All the implementations are made publicly available for future research.7 https://github.com/Shangtong Zhang/Deep RL
Open Datasets Yes We benchmarked COF-PAC, ACE and TD3 (Fujimoto et al., 2018) in Reacher-v2 from Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper mentions training and evaluation but does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The acknowledgments mention a 'generous equipment grant from NVIDIA', indicating NVIDIA GPUs were used, but no specific GPU model, CPU, or other detailed hardware specifications for the experiments are provided.
Software Dependencies No The paper mentions implementation details, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For GEM, we consider a fixed learning rate α and tune it from {0.1 2^1, . . . , 0.1 2^6}. [...] For GEM-ETD(0), we set α1 = 0.025 and tune α2 in the same range as α. [...] Our implementation is based on Zhang et al. (2019), and we inherited their hyperparameters.