reproducibilityindex.ai

GenDICE: Generalized Offline Estimation of Stationary Values

Authors: Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove its consistency under general conditions, provide an error analysis, and demonstrate strong empirical performance on benchmark problems, including off-line Page Rank and off-policy policy evaluation. In this section, we evaluate Gen DICE on OPE and OPR problems.
Researcher Affiliation	Collaboration	Ruiyi Zhang1 , Bo Dai2 , Lihong Li2, Dale Schuurmans2 1Duke University, 2Google Research, Brain Team
Pseudocode	Yes	Algorithm 1 Gen DICE (with function approximators)
Open Source Code	Yes	Our code is publicly available at https://github.com/zhangry868/Gen DICE.
Open Datasets	Yes	We test Gen DICE on a Barabasi-Albert (BA) graph (synthetic), and two real-world graphs, Cora and Citeseer. Details of the graphs are given in Appendix D. For two real-world graphs, it is built upon the real-world citation networks.
Dataset Splits	No	The paper mentions collecting a "fixed number of trajectories" and using "off-policy data", but it does not provide specific details on how these datasets are split into training, validation, and test sets, such as percentages, absolute counts, or references to predefined standard splits for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU types (e.g., Intel Xeon, AMD Ryzen), or memory specifications.
Software Dependencies	No	The paper mentions using the Adam optimizer and neural network architectures (e.g., "feed-forward with two hidden layers of dimension 64 and tanh activations"), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup	Yes	We train each stationary distribution correction estimation method using the Adam optimizer with batches of size 2048 and learning rates chosen using a hyperparameter search from {0.0001, 0.0003, 0.001, 0.003} and choose the best one as 0.0003. All neural networks are feed-forward with two hidden layers of dimension 64 and tanh activations.