Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Authors: Pan Xu, Felicia Gao, Quanquan Gu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms. Our experimental results on classical control tasks in reinforcement learning demonstrate the superior performance of the proposed SRVR-PG and SRVR-PG-PE algorithms and verify our theoretical analysis.
Researcher Affiliation Academia Pan Xu, Felicia Gao, Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90094, USA panxu@cs.ucla.edu, fxgao1160@engineering.ucla.edu, qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Stochastic Recursive Variance Reduced Policy Gradient (SRVR-PG) and Algorithm 2 Stochastic Recursive Variance Reduced Policy Gradient with Parameter-based Exploration (SRVR-PG-PE)
Open Source Code No The paper does not provide any concrete statement or link indicating that the source code for the proposed SRVR-PG or SRVR-PG-PE methodology is publicly available.
Open Datasets Yes We provide experiment results of the proposed algorithm on benchmark reinforcement learning environments including the Cartpole, Mountain Car and Pendulum problems.
Dataset Splits No The paper mentions using 'benchmark reinforcement learning environments' but does not provide specific details on train/validation/test dataset splits, as these are simulated environments where policies are learned through interaction rather than traditional dataset splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using a 'Gaussian policy' and refers to 'grid search' for tuning parameters, but it does not specify any software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes The detailed parameters used in the experiments are presented in Appendix E. Table 2: Parameters used in the SRVR-PG experiments. ... Table 3: Parameters used in the SRVR-PG-PE experiments. These tables include specific hyperparameters such as 'NN size', 'Task horizon', 'Discount factor γ', 'Learning rate η', 'Batch size N', 'Batch size B', and 'Epoch size m'.