reproducibilityindex.ai

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Authors: Nathan Kallus, Masatoshi Uehara

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.
Researcher Affiliation	Academia	Nathan Kallus Cornell University New York, NY kallus@cornell.edu Masatoshi Uehara Harvard University Cambrdige, MA uehara_m@g.harvard.edu
Pseudocode	No	The paper describes algorithms and derivations in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We evaluate the OPE algorithms using the standard classiﬁcation data-sets from the UCI repository. Here, we follow the same procedure of transforming a classiﬁcation data-set into a contextual bandit data set as in [5, 6]. ... We next compare the OPE algorithms in three standard RL setting from Open AI Gym [3]: Windy Grid World, Cliff Walking, and Mountain Car.
Dataset Splits	Yes	We ﬁrst split the data into training and evaluation. ... We again split the data into training and evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory).
Software Dependencies	No	The paper mentions methods like 'logistic regression', 'Q-learning', and 'off-policy TD learning', and refers to 'Open AI Gym', but it does not specify any version numbers for these software components or libraries.
Experiment Setup	Yes	The resulting estimation RMSEs (root mean square error) over 200 replications of each experiment are given in Tables 2–4, where we highlight in bold the best two methods in each case. We again split the data into training and evaluation. ... We set the discounting factor to be 1.0 as in [6].