reproducibilityindex.ai

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Authors: Ziyang Tang*, Yihao Feng*, Lihong Li, Dengyong Zhou, Qiang Liu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both theoretical and empirical results show that our method yields signiﬁcant advantages over previous methods.
Researcher Affiliation	Collaboration	Ziyang Tang * The University of Texas at Austin ztang@cs.utexas.edu Yihao Feng The University of Texas at Austin yihao@cs.utexas.edu Lihong Li Google Research lihong@google.com Dengyong Zhou Google Research dennyzhou@google.com Qiang Liu The University of Texas at Austin lqiang@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 Inﬁnite Horizon Doubly Robust Estimator
Open Source Code	No	The paper mentions using 'open source implementation' (footnote 2) for deep Q-learning, which points to a third-party repository. It also provides a link for 'additional experimental results' (footnote 3) but does not contain an unambiguous statement of releasing the specific code for their own methodology.
Open Datasets	Yes	Taxi Environment We follow Liu et al. (2018a) s tabular environment Taxi
Dataset Splits	No	The paper mentions using 'a set of independent sample to ﬁrst train a value function b V and a density function bρ' and 'a seperate training dataset with 200 trajectories whose horizon length is 1000', but does not provide specific numerical splits (e.g., percentages or counts) for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper mentions 'Open AI Gym' and 'Adam Optimizer' but does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup	Yes	For more experimental details, please check appendix C.1.