Stochastic Variance Reduction Methods for Policy Evaluation

Authors: Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.
Researcher Affiliation Collaboration 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA. 2Microsoft Research, Redmond, Washington 98052, USA.
Pseudocode Yes Algorithm 1 PDBG for Policy Evaluation
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets Yes In the first task, we consider a randomly generated MDP with 400 states and 10 actions (Dann et al., 2014). ... Next, we test these algorithms on Mountain Car (Sutton & Barto, 1998, Chapter 8).
Dataset Splits No The paper mentions using a 'fixed, finite dataset' but does not provide specific details on training, validation, or test dataset splits or percentages.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes For step size tuning, σ is chosen from 4e-1, 1e-2, ..., 1e-6 / (1 + lambda_max(C_hat)) and σw is chosen from 4e-1, 1e-1, 1e-2 / lambda_max(C_hat). ... for SVRG we choose N = 2n. ... We chose γ = 0.95.