DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Authors: Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.1
Researcher Affiliation Industry Ofir Nachum Yinlam Chow Bo Dai Lihong Li Google Research {ofirnachum,yinlamchow,bodai,lihong}@google.com
Pseudocode No The paper describes the algorithm steps in text and mathematical formulations but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes 1Find code at https://github.com/google-research/google-research/tree/master/dual_dice.
Open Datasets Yes We begin with a tabular task, the Taxi domain [7]. In this task, we evaluate our method in a manner agnostic to optimization difficulties: The objective 6 is a quadratic equation in ν, and thus may be solved by matrix operations. [...] We now move on to difficult control tasks: A discrete-control task Cartpole and a continuous-control task Reacher [4].
Dataset Splits No The paper mentions collecting trajectories for experiments but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or specific predefined splits.
Hardware Specification No The paper mentions the use of 'neural network function approximators' and 'stochastic optimization' but does not specify any details regarding the hardware used for these computations (e.g., CPU, GPU models, memory).
Software Dependencies No The paper mentions 'neural network function approximators' but does not provide specific versions for any software libraries, frameworks, or programming languages used (e.g., TensorFlow version, PyTorch version, Python version).
Experiment Setup No The paper discusses different choices of function f and varying trajectory lengths/numbers. However, it does not explicitly provide specific hyperparameters such as learning rates, batch sizes, optimizer settings, or other detailed training configurations in the main text.