CoinDICE: Off-Policy Confidence Interval Estimation
Authors: Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvari, Dale Schuurmans
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now evaluate the empirical performance of Coin DICE, comparing it to a number of existing confidence interval estimators for OPE based on concentration inequalities. |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team 2University of Alberta 3Deep Mind |
| Pseudocode | No | The provided text does not contain any structured pseudocode or algorithm blocks, although it refers to "Algorithm 1" in an appendix. |
| Open Source Code | Yes | Open-source code for Coin DICE is available at https://github.com/google-research/dice_rl. |
| Open Datasets | Yes | We use Frozen Lake (Brockman et al., 2016), a highly stochastic gridworld environment, and Taxi (Dietterich, 1998), an environment with a moderate state space of 2 000 elements. ... Lastly, we evaluate Coin DICE on Reacher (Brockman et al., 2016; Todorov et al., 2012), a continuous control environment. |
| Dataset Splits | No | The paper mentions collecting a static dataset and sampling from it, but does not specify explicit training/validation/test splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions). |
| Experiment Setup | Yes | The discount factor is γ = 0.99. The target policy is taken to be a near-optimal one, while the behavior policy is highly suboptimal. The behavior policy in Frozen Lake is the optimal policy with 0.2 white noise... in this setting, we use a one-hidden-layer neural network with ReLU activations. |