PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Authors: Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
Researcher Affiliation Collaboration Alekh Agarwal Microsoft Research, Redmond Sham Kakade University of Washington and Microsoft Research NYC Mikael Henaff Facebook AI Research Wen Sun Cornell University
Pseudocode Yes Algorithm 1 d sampler and Q estimator; Algorithm 2 POLICY COVER GUIDED POLICY GRADIENT (PC-PG); Algorithm 3 Natural Policy Gradient (NPG) Update
Open Source Code No The paper cites a third-party open-source project ('Modularized implementation of deep rl algorithms in pytorch. https://github.com/Shangtong Zhang/Deep RL, 2018.') but does not provide concrete access to the source code specifically for the PC-PG methodology described in this paper.
Open Datasets Yes We further evaluated PC-PG on continuous control Mountain Car from Open AI Gym [15]. Note here actions are continuous in [ 1, 1] and incur a small negative reward. We evaluated PC-PG in a reward-free setting using maze environments adapted from [46].
Dataset Splits No The paper describes the environments used (Bidirectional Diabolical Combination Lock, Mazes, Continuous Control Mountain Car), but it does not provide specific percentages or counts for training, validation, and test dataset splits of collected data, nor does it reference predefined splits for these interactive environments.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like 'Pytorch [48]' but does not provide specific version numbers for PyTorch or any other key software components used in the experiments.
Experiment Setup No The paper states, 'Details of the implemented algorithm, network architectures and kernels can be found in Appendix I.', indicating that specific experimental setup details such as hyperparameters and network architectures are not provided in the main text.