reproducibilityindex.ai

Variance Reduced Policy Evaluation with Smooth Function Approximation

Authors: Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm.
Researcher Affiliation	Academia	Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong htwai@se.cuhk.edu.hk Mingyi Hong University of Minnesota Minneapolis, MN, USA mhong@umn.edu Zhuoran Yang Princeton University Princeton, NJ, USA zy6@princeton.edu Zhaoran Wang Northwestern University Evanston, IL, USA zhaoranwang@gmail.com Kexin Tang University of Minnesota Minneapolis, MN, USA tangk@umn.edu
Pseudocode	Yes	Algorithm 1 Nonconvex Primal-Dual Gradient with Variance Reduction (n PD-VR) Algorithm.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm.
Dataset Splits	No	The paper mentions using the Mountain Car dataset with m = 5000, but it does not provide specific details on how this data was split into training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions).
Experiment Setup	Yes	To learn the value function, we parameterize Vθ( ) as a 2-layer neural network with n hidden neurons and consider a forgetting factor γ = 0.95. We set the constraints in (16) with Θ = [0, 1]n, and in addition we consider w to be bounded in [0, 100]n for better numerical stability, which can be enforced by incorporating a projection step after (19). For the n PD-VR algorithm, we set the step sizes as α = 10 4, β = 10 8.