DiCE: The Infinitely Differentiable Monte Carlo Estimator
Authors: Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the correctness of DICE both through a proof and numerical evaluation of the DICE derivative estimates. We also use DICE to propose and evaluate a novel approach for multi-agent learning. Our code is available at github.com/alshedivat/lola. |
| Researcher Affiliation | Academia | 1University of Oxford 2Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 LOLA-Di CE: policy gradient update for θ1 |
| Open Source Code | Yes | Our code is available at github.com/alshedivat/lola. |
| Open Datasets | No | The paper mentions evaluating on the "iterated prisoner s dilemma (IPD)" but does not refer to a specific, named dataset that is publicly available with a citation or link for training. Data is generated through interactions within this environment. |
| Dataset Splits | No | The paper does not provide specific details about training, validation, and test dataset splits. The experiments are conducted in a reinforcement learning setting where data is generated through interaction. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions using "Tensor Flow (Abadi et al., 2016) or Py Torch (Paszke et al., 2017)" and refers to "pyro.infer.util.Dice and tensorflow probability.python.monte carlo.expectation". However, it does not provide specific version numbers for these software libraries or frameworks, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | In our experiments, we use a time horizon of 150 steps and a reduced batch size of 64; the lookahead gradient step, α, is set to 1 and the learning rate is 0.3. |