reproducibilityindex.ai

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Authors: Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we present an empirical study of our algorithm applied to off-policy policy evaluation and ﬁnd that our algorithm signiﬁcantly improves accuracy compared to existing techniques.1
Researcher Affiliation	Industry	Oﬁr Nachum Yinlam Chow Bo Dai Lihong Li Google Research {ofirnachum,yinlamchow,bodai,lihong}@google.com
Pseudocode	No	The paper describes the algorithm steps in text and mathematical formulations but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Find code at https://github.com/google-research/google-research/tree/master/dual_dice.
Open Datasets	Yes	We begin with a tabular task, the Taxi domain [7]. In this task, we evaluate our method in a manner agnostic to optimization difﬁculties: The objective 6 is a quadratic equation in ν, and thus may be solved by matrix operations. [...] We now move on to difﬁcult control tasks: A discrete-control task Cartpole and a continuous-control task Reacher [4].
Dataset Splits	No	The paper mentions collecting trajectories for experiments but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or specific predefined splits.
Hardware Specification	No	The paper mentions the use of 'neural network function approximators' and 'stochastic optimization' but does not specify any details regarding the hardware used for these computations (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions 'neural network function approximators' but does not provide specific versions for any software libraries, frameworks, or programming languages used (e.g., TensorFlow version, PyTorch version, Python version).
Experiment Setup	No	The paper discusses different choices of function f and varying trajectory lengths/numbers. However, it does not explicitly provide specific hyperparameters such as learning rates, batch sizes, optimizer settings, or other detailed training configurations in the main text.