Accountable Off-Policy Evaluation With Kernel Bellman Statistics

Authors: Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our method yields tight confidence intervals in different settings.
Researcher Affiliation Academia 1Department of Computer Science, The University of Texas at Austin.
Pseudocode Yes Algorithm 1 Confidence Bounds for Off-Policy Evaluation; Algorithm 2 Post-hoc Diagnosis for Existing Estimators
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes We use OpenAI Gym environments (Brockman et al., 2016).
Dataset Splits No The paper mentions varying the 'number of transitions n' but does not specify explicit training, validation, or test dataset splits in terms of percentages or counts.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions 'CVXPY (Diamond & Boyd, 2016; Agrawal et al., 2018)' and 'OpenAI Gym environments (Brockman et al., 2016)' but does not provide specific version numbers for these software components.
Experiment Setup Yes The default parameters (when not varied) are: discounted factor γ = 0.95; horizon length T = 50 for Inverted-Pendulum and T = 100 for Puck-Mountain; number of episodes 20; failure probability δ = 0.10; temperature of the behavior policy τ = 1; and the feature dimension 10.