DiCE: The Infinitely Differentiable Monte Carlo Estimator

Authors: Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the correctness of DICE both through a proof and numerical evaluation of the DICE derivative estimates. We also use DICE to propose and evaluate a novel approach for multi-agent learning. Our code is available at github.com/alshedivat/lola.
Researcher Affiliation Academia 1University of Oxford 2Carnegie Mellon University.
Pseudocode Yes Algorithm 1 LOLA-Di CE: policy gradient update for θ1
Open Source Code Yes Our code is available at github.com/alshedivat/lola.
Open Datasets No The paper mentions evaluating on the "iterated prisoner s dilemma (IPD)" but does not refer to a specific, named dataset that is publicly available with a citation or link for training. Data is generated through interactions within this environment.
Dataset Splits No The paper does not provide specific details about training, validation, and test dataset splits. The experiments are conducted in a reinforcement learning setting where data is generated through interaction.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications).
Software Dependencies No The paper mentions using "Tensor Flow (Abadi et al., 2016) or Py Torch (Paszke et al., 2017)" and refers to "pyro.infer.util.Dice and tensorflow probability.python.monte carlo.expectation". However, it does not provide specific version numbers for these software libraries or frameworks, which are necessary for reproducible software dependencies.
Experiment Setup Yes In our experiments, we use a time horizon of 150 steps and a reduced batch size of 64; the lookahead gradient step, α, is set to 1 and the learning rate is 0.3.