Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Variance Reduced Policy Evaluation with Smooth Function Approximation
Authors: Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm. |
| Researcher Affiliation | Academia | Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong EMAIL Mingyi Hong University of Minnesota Minneapolis, MN, USA EMAIL Zhuoran Yang Princeton University Princeton, NJ, USA EMAIL Zhaoran Wang Northwestern University Evanston, IL, USA EMAIL Kexin Tang University of Minnesota Minneapolis, MN, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Nonconvex Primal-Dual Gradient with Variance Reduction (n PD-VR) Algorithm. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm. |
| Dataset Splits | No | The paper mentions using the Mountain Car dataset with m = 5000, but it does not provide specific details on how this data was split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | To learn the value function, we parameterize Vθ( ) as a 2-layer neural network with n hidden neurons and consider a forgetting factor γ = 0.95. We set the constraints in (16) with Π= [0, 1]n, and in addition we consider w to be bounded in [0, 100]n for better numerical stability, which can be enforced by incorporating a projection step after (19). For the n PD-VR algorithm, we set the step sizes as α = 10 4, β = 10 8. |