reproducibilityindex.ai

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Authors: Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
Researcher Affiliation	Collaboration	Alekh Agarwal Microsoft Research, Redmond Sham Kakade University of Washington and Microsoft Research NYC Mikael Henaff Facebook AI Research Wen Sun Cornell University
Pseudocode	Yes	Algorithm 1 d sampler and Q estimator; Algorithm 2 POLICY COVER GUIDED POLICY GRADIENT (PC-PG); Algorithm 3 Natural Policy Gradient (NPG) Update
Open Source Code	No	The paper cites a third-party open-source project ('Modularized implementation of deep rl algorithms in pytorch. https://github.com/Shangtong Zhang/Deep RL, 2018.') but does not provide concrete access to the source code specifically for the PC-PG methodology described in this paper.
Open Datasets	Yes	We further evaluated PC-PG on continuous control Mountain Car from Open AI Gym [15]. Note here actions are continuous in [ 1, 1] and incur a small negative reward. We evaluated PC-PG in a reward-free setting using maze environments adapted from [46].
Dataset Splits	No	The paper describes the environments used (Bidirectional Diabolical Combination Lock, Mazes, Continuous Control Mountain Car), but it does not provide specific percentages or counts for training, validation, and test dataset splits of collected data, nor does it reference predefined splits for these interactive environments.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions software like 'Pytorch [48]' but does not provide specific version numbers for PyTorch or any other key software components used in the experiments.
Experiment Setup	No	The paper states, 'Details of the implemented algorithm, network architectures and kernels can be found in Appendix I.', indicating that specific experimental setup details such as hyperparameters and network architectures are not provided in the main text.