reproducibilityindex.ai

EigenGame Unloaded: When playing games is better than optimizing

Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its performance with extensive experiments including dimensionality reduction of massive data sets and clustering a large social network graph.
Researcher Affiliation	Industry	Ian Gemp , Brian Mc Williams , Claire Vernade & Thore Graepel Deep Mind, London UK {imgemp,bmcw,vernade}@deepmind.com, thoregraepel@gmail.com
Pseudocode	Yes	Algorithm 1 presents pseudocode for µ-Eigen Game where computation is parallelized over the k players.
Open Source Code	No	For the sake of reproducibility we have included pseudocode in Jax. We use the Optax optimization library Hessel et al. (2020) and the Jaxline training framework.
Open Datasets	Yes	We compare µ-Eigen Game against α-Eigen Game, GHA (Sanger, 1989), Matrix Krasulina (Tang, 2019), and Oja s algorithm (Allen-Zhu and Li, 2017) on the MNIST dataset. ... The dataset consists a subset of the 40 billion words used to train the transformer-based Meena language model (Adiwardana et al., 2020). ... The Facebook graph consists of 134, 833 nodes, 1, 380, 293 edges, and 8 connected components... (Leskovec and Krevl, 2014; Rozemberczki et al., 2019).
Dataset Splits	No	For MNIST, it states 'Learning rates were chosen from {10 3, . . . , 10 6} on 10 held out runs,' which implies hyperparameter tuning, but it does not specify explicit training/validation/test dataset split percentages or sample counts. It refers to a 'training set' but provides no details on how it was split.
Hardware Specification	Yes	Speciﬁcally we consider the parallel framework speciﬁed by TPUv3 available in Google Cloud... We use minibatches of size 4,096 in each TPU. We do model parallelism across 4 TPUs... The experiment was run on a single CPU.
Software Dependencies	No	The paper mentions using 'Optax optimization library' and 'Jaxline training framework' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use minibatches of size 4,096 in each TPU. We compute and apply updates using SGD with a learning rate of 5 10 5 and Nesterov momentum with a factor of 0.9. ... Learning rates were chosen from {10 3, . . . , 10 6} on 10 held out runs.