reproducibilityindex.ai

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

Authors: Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our method yields tight conﬁdence intervals in different settings.
Researcher Affiliation	Academia	1Department of Computer Science, The University of Texas at Austin.
Pseudocode	Yes	Algorithm 1 Conﬁdence Bounds for Off-Policy Evaluation; Algorithm 2 Post-hoc Diagnosis for Existing Estimators
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets	Yes	We use OpenAI Gym environments (Brockman et al., 2016).
Dataset Splits	No	The paper mentions varying the 'number of transitions n' but does not specify explicit training, validation, or test dataset splits in terms of percentages or counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions 'CVXPY (Diamond & Boyd, 2016; Agrawal et al., 2018)' and 'OpenAI Gym environments (Brockman et al., 2016)' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The default parameters (when not varied) are: discounted factor γ = 0.95; horizon length T = 50 for Inverted-Pendulum and T = 100 for Puck-Mountain; number of episodes 20; failure probability δ = 0.10; temperature of the behavior policy τ = 1; and the feature dimension 10.