reproducibilityindex.ai

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Authors: Nan Jiang, Lihong Li

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the estimator s accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. 6. Experiments
Researcher Affiliation	Collaboration	Nan Jiang NANJIANG@UMICH.EDU Computer Science & Engineering, University of Michigan Lihong Li LIHONGLI@MICROSOFT.COM Microsoft Research
Pseudocode	No	The paper defines estimators using mathematical equations (e.g., Eqn. 10) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not include an explicit statement about releasing its source code or provide any links to a code repository for the methodology described.
Open Datasets	Yes	In the last domain, we use the donation dataset from KDD Cup 1998 (Hettich & Bay, 1999)
Dataset Splits	Yes	We therefore split Deval further into two subsets Dreg and Dtest, estimate b Q from Dreg and apply DR on Dtest. we partition Deval into k subsets, apply Eqn.(8) to each subset with b Q estimated from the remaining data, and finally average the estimate over all subsets. we split \|D\| so that \|Dtrain\|/\|D\| {0.2, 0.4, 0.6, 0.8}
Hardware Specification	No	The paper does not provide specific hardware details (like CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	Model fitting We use state aggregations: the two state variables are multiplied by 26 and 28 respectively, and the rounded integers are treated as the abstract state. We then estimate an MDP model from data using a tabular approach. mix πtrain and π0 with rate α {0, 0.1, . . . , 0.9}