GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Authors: Shangtong Zhang, Bo Liu, Shimon Whiteson
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide empirical results demonstrating the advantages of Gradient DICE over Gen DICE and Dual DICE. |
| Researcher Affiliation | Academia | 1University of Oxford 2Auburn University. Correspondence to: Shangtong Zhang <shangtong.zhang@cs.ox.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 Projected Gradient DICE |
| Open Source Code | Yes | The implementations are made publicly available for future research.4 4https://github.com/Shangtong Zhang/Deep RL |
| Open Datasets | Yes | We consider two variants of Boyan s Chain (Boyan, 1999) as shown in Figure 4. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. |
| Hardware Specification | No | The paper mentions 'NVIDIA' in the acknowledgments for an 'equipment grant', but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were mentioned. |
| Experiment Setup | Yes | We use neural networks to parameterize τ and f, each of which is represented by a two-hidden-layer network with 64 hidden units and Re LU (Nair & Hinton, 2010) activation function. ... We train each algorithm for 10^3 steps and examine MSE(ρ) .= 1/2(ργ(π) − ρ̂γ(π))^2 every 10 steps... We use SGD to train the neural networks with batch size 128. The learning rate α and the penalty coefficient λ are tuned from {0.01, 0.005, 0.001} and {0.1, 1} with grid search to minimize MVE(ρ) at the end of training. |