Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Authors: Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our method yields tight confidence intervals in different settings. |
| Researcher Affiliation | Academia | 1Department of Computer Science, The University of Texas at Austin. |
| Pseudocode | Yes | Algorithm 1 Confidence Bounds for Off-Policy Evaluation; Algorithm 2 Post-hoc Diagnosis for Existing Estimators |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | We use OpenAI Gym environments (Brockman et al., 2016). |
| Dataset Splits | No | The paper mentions varying the 'number of transitions n' but does not specify explicit training, validation, or test dataset splits in terms of percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions 'CVXPY (Diamond & Boyd, 2016; Agrawal et al., 2018)' and 'OpenAI Gym environments (Brockman et al., 2016)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The default parameters (when not varied) are: discounted factor γ = 0.95; horizon length T = 50 for Inverted-Pendulum and T = 100 for Puck-Mountain; number of episodes 20; failure probability δ = 0.10; temperature of the behavior policy τ = 1; and the feature dimension 10. |