Off-Policy Interval Estimation with Lipschitz Value Iteration
Authors: Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, Qiang Liu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our algorithm on a number of benchmarks and show that it can provide tight and provably correct bounds. |
| Researcher Affiliation | Academia | Ziyang Tang University of Texas at Austin ztang@utexas.edu Yihao Feng University of Texas at Austin yihao@cs.utexas.edu Na Zhang Tsinghua University zhangna@pbcsf.tsinghua.edu.cn Jian Peng University of Illinois at Urbana-Champaign jianpeng@illinois.edu Qiang Liu University of Texas at Austin lqiang@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Lipschitz Value Iteration (for Upper Bound); Algorithm 2 Lipschitz Value (Upper Bound) Iteration with Stochastic Update |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. There is no explicit statement of code release, nor a link to a code repository. |
| Open Datasets | No | The paper mentions using "Transition data D = {si, ai, s i, ri}1 i n" and performs experiments on "Synthesis Environment with A Known Value Function", "Pendulum Environment", and "HIV Simulator". While these are known environments, the paper does not provide specific access information (link, DOI, citation with authors/year) for a publicly available or open dataset that was used for training or was made available by the authors for replication. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The default settings: number of trajectory nt = 30, Horizon length H = 100, discounted factor γ = 0.95, Lipschitz constant η = 2.0 and subsample size n B = 500. We run Lipschitz value iteration for 100 iteration to ensure almost convergence. |