GenDICE: Generalized Offline Estimation of Stationary Values
Authors: Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove its consistency under general conditions, provide an error analysis, and demonstrate strong empirical performance on benchmark problems, including off-line Page Rank and off-policy policy evaluation. In this section, we evaluate Gen DICE on OPE and OPR problems. |
| Researcher Affiliation | Collaboration | Ruiyi Zhang1 , Bo Dai2 , Lihong Li2, Dale Schuurmans2 1Duke University, 2Google Research, Brain Team |
| Pseudocode | Yes | Algorithm 1 Gen DICE (with function approximators) |
| Open Source Code | Yes | Our code is publicly available at https://github.com/zhangry868/Gen DICE. |
| Open Datasets | Yes | We test Gen DICE on a Barabasi-Albert (BA) graph (synthetic), and two real-world graphs, Cora and Citeseer. Details of the graphs are given in Appendix D. For two real-world graphs, it is built upon the real-world citation networks. |
| Dataset Splits | No | The paper mentions collecting a "fixed number of trajectories" and using "off-policy data", but it does not provide specific details on how these datasets are split into training, validation, and test sets, such as percentages, absolute counts, or references to predefined standard splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU types (e.g., Intel Xeon, AMD Ryzen), or memory specifications. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and neural network architectures (e.g., "feed-forward with two hidden layers of dimension 64 and tanh activations"), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We train each stationary distribution correction estimation method using the Adam optimizer with batches of size 2048 and learning rates chosen using a hyperparameter search from {0.0001, 0.0003, 0.001, 0.003} and choose the best one as 0.0003. All neural networks are feed-forward with two hidden layers of dimension 64 and tanh activations. |