reproducibilityindex.ai

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

Authors: Aurelien Bibaut, Ivana Malenica, Nikos Vlassis, Mark Van Der Laan

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspeciﬁcation. In this section, we demonstrate the effectiveness of RLTMLE by comparing it with other state-of-the-art methods used for OPE problem in various RL benchmark environments.
Researcher Affiliation	Collaboration	1University of California, Berkeley, CA 2Netﬂix, Los Gatos, CA.
Pseudocode	Yes	We present the pseudo-code of the procedure as Algorithm 1. Because of space limitation, we only give a pseudo-code description of RLTMLE 2, which is our most performant algorithm, as we will see in the next section.
Open Source Code	No	The paper does not contain any statement about releasing source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	No	The paper mentions using well-known RL benchmark environments like 'Grid World', 'Model Fail', and 'Model Win' and states 'We implement the same behavior and evaluation policies as in previous work (Thomas & Brunskill, 2016; Farajtabar et al., 2018).', but it does not provide concrete access information (specific links, DOIs, repositories, or formal citations including authors and year for the datasets themselves) to these datasets.
Dataset Splits	No	The paper describes an internal sample splitting (D(0) and D(1)) for the algorithm's operation, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for evaluating the model's overall performance on unseen data for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries).
Experiment Setup	Yes	In addition, we test sensitivity to the number of episodes in D with n = {100, 200, 500, 1000) for Grid World and Model Fail, and n = {100, 500, 1000, 5000, 10000) for Model Win. We start with small amount of bias, b0 = 0.005 Normal(0, 1)... Consequently, we increase model misspeciﬁcation to b0 = 0.05 Normal(0, 1).