Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
Authors: Yihao Feng, Ziyang Tang, na zhang, qiang liu
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results that clearly demonstrate the advantages of our approach over existing methods. (...) We present our main approach in Section 4 and perform empirical studies in Section 5. |
| Researcher Affiliation | Academia | Yihao Feng *, Ziyang Tang University of Texas at Austin EMAIL Na Zhang Tsinghua University EMAIL Qiang Liu University of Texas at Austin EMAIL |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We test our method on three environments: Inverted Pendulum and Cart Pole from Open AI Gym (Brockman et al., 2016), and a Type-1 Diabetes medical treatment simulator.1 (...) 1 https://github.com/jxx123/simglucose. |
| Dataset Splits | Yes | The bandwidth of k and k are selected to make sure the function Bellman loss is not large on a validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions using 'PPO (Schulman et al., 2017)' as a policy training method but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | For horizon lengths, We fix γ = 0.95 and set horizon length H = 50 for Inverted-Pendulum, H = 100 for Cart Pole, and H = 50 for Diabetes simulator. (...) We take both kernels to be Gaussian RBF kernel and choose r Q and the bandwidths of k and k using the procedure in Appendix H.2. We use a fast approximation method to optimize ω in F + Q(ω) and F Q (ω) as shown in Appendix D. |