DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Authors: Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.1 |
| Researcher Affiliation | Industry | Ofir Nachum Yinlam Chow Bo Dai Lihong Li Google Research {ofirnachum,yinlamchow,bodai,lihong}@google.com |
| Pseudocode | No | The paper describes the algorithm steps in text and mathematical formulations but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Find code at https://github.com/google-research/google-research/tree/master/dual_dice. |
| Open Datasets | Yes | We begin with a tabular task, the Taxi domain [7]. In this task, we evaluate our method in a manner agnostic to optimization difficulties: The objective 6 is a quadratic equation in ν, and thus may be solved by matrix operations. [...] We now move on to difficult control tasks: A discrete-control task Cartpole and a continuous-control task Reacher [4]. |
| Dataset Splits | No | The paper mentions collecting trajectories for experiments but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or specific predefined splits. |
| Hardware Specification | No | The paper mentions the use of 'neural network function approximators' and 'stochastic optimization' but does not specify any details regarding the hardware used for these computations (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions 'neural network function approximators' but does not provide specific versions for any software libraries, frameworks, or programming languages used (e.g., TensorFlow version, PyTorch version, Python version). |
| Experiment Setup | No | The paper discusses different choices of function f and varying trajectory lengths/numbers. However, it does not explicitly provide specific hyperparameters such as learning rates, batch sizes, optimizer settings, or other detailed training configurations in the main text. |