PILOT: An $\mathcal{O}(1/K)$-Convergent Approach for Policy Evaluation with Nonlinear Function Approximation
Authors: Zhuqing Liu, Xin Zhang, Jia Liu, Zhengyuan Zhu, Songtao Lu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct our numerical experiments to verify our theoretical results. We compare our work with the basic stochastic gradient (SG) method (Lin et al., 2020b) and three state-of-the-art algorithms for PE: n PD-VR (Wai et al., 2019), STSG (Qiu et al., 2020) and VR-STSG (Qiu et al., 2020). Due to space limitation, we provide our detailed experiment settings in the Appendix. Numerical Results: First, we compare the loss value and gradient norm performance based on Mountain Car-v0 and Cartpole-v0 with n PD-VR, SG, STSG, and VR-STSG in Figs. 1 and 2. |
| Researcher Affiliation | Collaboration | Zhuqing Liu , Xin Zhang , Jia Liu , Zhengyuan Zhu , Songtao Lu Department of Electrical and Computer Engineering, The Ohio State University Department of Statistics, Iowa State University IBM Research, IBM Thomas J. Watson Research Center |
| Pseudocode | Yes | Algorithm 1: The path-integrated primal-dual stochastic gradient (PILOT)., Algorithm 2: Adaptive-batch PILOT method (PILOT+). |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the release of source code for the described methodology. |
| Open Datasets | Yes | Numerical Results: First, we compare the loss value and gradient norm performance based on Mountain Car-v0 and Cartpole-v0 with n PD-VR, SG, STSG, and VR-STSG in Figs. 1 and 2. |
| Dataset Splits | No | The paper mentions 'training data' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "OpenAI Gym" but does not provide specific software dependency versions (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9, or specific Gym version). |
| Experiment Setup | No | Due to space limitation, we provide our detailed experiment settings in the Appendix. |