reproducibilityindex.ai

DiCE: The Infinitely Differentiable Monte Carlo Estimator

Authors: Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the correctness of DICE both through a proof and numerical evaluation of the DICE derivative estimates. We also use DICE to propose and evaluate a novel approach for multi-agent learning. Our code is available at github.com/alshedivat/lola.
Researcher Affiliation	Academia	1University of Oxford 2Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 LOLA-Di CE: policy gradient update for θ1
Open Source Code	Yes	Our code is available at github.com/alshedivat/lola.
Open Datasets	No	The paper mentions evaluating on the "iterated prisoner s dilemma (IPD)" but does not refer to a specific, named dataset that is publicly available with a citation or link for training. Data is generated through interactions within this environment.
Dataset Splits	No	The paper does not provide specific details about training, validation, and test dataset splits. The experiments are conducted in a reinforcement learning setting where data is generated through interaction.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications).
Software Dependencies	No	The paper mentions using "Tensor Flow (Abadi et al., 2016) or Py Torch (Paszke et al., 2017)" and refers to "pyro.infer.util.Dice and tensorﬂow probability.python.monte carlo.expectation". However, it does not provide specific version numbers for these software libraries or frameworks, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	In our experiments, we use a time horizon of 150 steps and a reduced batch size of 64; the lookahead gradient step, α, is set to 1 and the learning rate is 0.3.