Variance Reduced Policy Evaluation with Smooth Function Approximation
Authors: Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm. |
| Researcher Affiliation | Academia | Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong htwai@se.cuhk.edu.hk Mingyi Hong University of Minnesota Minneapolis, MN, USA mhong@umn.edu Zhuoran Yang Princeton University Princeton, NJ, USA zy6@princeton.edu Zhaoran Wang Northwestern University Evanston, IL, USA zhaoranwang@gmail.com Kexin Tang University of Minnesota Minneapolis, MN, USA tangk@umn.edu |
| Pseudocode | Yes | Algorithm 1 Nonconvex Primal-Dual Gradient with Variance Reduction (n PD-VR) Algorithm. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm. |
| Dataset Splits | No | The paper mentions using the Mountain Car dataset with m = 5000, but it does not provide specific details on how this data was split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | To learn the value function, we parameterize Vθ( ) as a 2-layer neural network with n hidden neurons and consider a forgetting factor γ = 0.95. We set the constraints in (16) with Θ = [0, 1]n, and in addition we consider w to be bounded in [0, 100]n for better numerical stability, which can be enforced by incorporating a projection step after (19). For the n PD-VR algorithm, we set the step sizes as α = 10 4, β = 10 8. |