Stochastic Variance Reduction Methods for Policy Evaluation
Authors: Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on benchmark problems demonstrate the effectiveness of our methods. |
| Researcher Affiliation | Collaboration | 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA. 2Microsoft Research, Redmond, Washington 98052, USA. |
| Pseudocode | Yes | Algorithm 1 PDBG for Policy Evaluation |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | In the first task, we consider a randomly generated MDP with 400 states and 10 actions (Dann et al., 2014). ... Next, we test these algorithms on Mountain Car (Sutton & Barto, 1998, Chapter 8). |
| Dataset Splits | No | The paper mentions using a 'fixed, finite dataset' but does not provide specific details on training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | For step size tuning, σ is chosen from 4e-1, 1e-2, ..., 1e-6 / (1 + lambda_max(C_hat)) and σw is chosen from 4e-1, 1e-1, 1e-2 / lambda_max(C_hat). ... for SVRG we choose N = 2n. ... We chose γ = 0.95. |