reproducibilityindex.ai

Open-ended learning in symmetric zero-sum games

Authors: David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply PSROr N to two highly nontransitive resource allocation games and ﬁnd that PSROr N consistently outperforms the existing alternatives. We investigated the performance of the proposed algorithms in two highly nontransitive resource allocation games. ... Figure 4. Performance of PSROr N relative to self-play, PSROU and PSRON on Blotto (left) and Differentiable Lotto (right). In all cases, the relative performance of PSROr N is positive, and therefore outperforms the other algorithms.
Researcher Affiliation	Industry	1DeepMind. Correspondence to: <dbalduzzi@google.com>.
Pseudocode	Yes	Algorithm 1 Optimization (against a ﬁxed opponent)... Algorithm 2 Self-play... Algorithm 3 Response to Nash (PSRON)... Algorithm 4 Response to rectiﬁed Nash (PSROr N).
Open Source Code	No	The paper does not contain any statement about releasing source code or provide any links to a code repository for the described methodology.
Open Datasets	Yes	We investigated the performance of the proposed algorithms in two highly nontransitive resource allocation games. Colonel Blotto (Borel, 1921; Tukey, 1949; Roberson, 2006) ... In Blotto, we investigate performance for a = 3 areas and c = 10 coins over k = 1000 games. Differentiable Lotto is inspired by continuous Lotto (Hart, 2008). ... Differentiable Lotto experiments are from k = 500 games with c = 9 customers chosen uniformly at random in the square [ 1, 1]2.
Dataset Splits	No	The paper describes experiments within game simulations (Colonel Blotto and Differentiable Lotto) and specifies the parameters for these simulations, but it does not mention traditional training, validation, or test dataset splits for a fixed dataset, as the data is generated through game play.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for running its experiments.
Software Dependencies	No	The paper mentions using 'maximum a posteriory policy optimization (MPO) (Abdolmaleki et al., 2018)' and 'gradient ascent' as oracles, but does not provide specific software names with version numbers for any libraries or frameworks used.
Experiment Setup	Yes	In Blotto, we investigate performance for a = 3 areas and c = 10 coins over k = 1000 games. An agent outputs a vector in R3 which is passed to a softmax, 10 and discretized to obtain three integers summing to 10. Differentiable Lotto experiments are from k = 500 games with c = 9 customers chosen uniformly at random in the square [ 1, 1]2. ... We impose agents to have width equal one.