Variance Reduced Policy Evaluation with Smooth Function Approximation

Authors: Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm.
Researcher Affiliation Academia Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong htwai@se.cuhk.edu.hk Mingyi Hong University of Minnesota Minneapolis, MN, USA mhong@umn.edu Zhuoran Yang Princeton University Princeton, NJ, USA zy6@princeton.edu Zhaoran Wang Northwestern University Evanston, IL, USA zhaoranwang@gmail.com Kexin Tang University of Minnesota Minneapolis, MN, USA tangk@umn.edu
Pseudocode Yes Algorithm 1 Nonconvex Primal-Dual Gradient with Variance Reduction (n PD-VR) Algorithm.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We present preliminary experiments of learning the value function from the Mountain Car dataset with m = 5000 via the n PD-VR algorithm.
Dataset Splits No The paper mentions using the Mountain Car dataset with m = 5000, but it does not provide specific details on how this data was split into training, validation, or test sets.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions).
Experiment Setup Yes To learn the value function, we parameterize Vθ( ) as a 2-layer neural network with n hidden neurons and consider a forgetting factor γ = 0.95. We set the constraints in (16) with Θ = [0, 1]n, and in addition we consider w to be bounded in [0, 100]n for better numerical stability, which can be enforced by incorporating a projection step after (19). For the n PD-VR algorithm, we set the step sizes as α = 10 4, β = 10 8.