EigenGame Unloaded: When playing games is better than optimizing
Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its performance with extensive experiments including dimensionality reduction of massive data sets and clustering a large social network graph. |
| Researcher Affiliation | Industry | Ian Gemp , Brian Mc Williams , Claire Vernade & Thore Graepel Deep Mind, London UK {imgemp,bmcw,vernade}@deepmind.com, thoregraepel@gmail.com |
| Pseudocode | Yes | Algorithm 1 presents pseudocode for µ-Eigen Game where computation is parallelized over the k players. |
| Open Source Code | No | For the sake of reproducibility we have included pseudocode in Jax. We use the Optax optimization library Hessel et al. (2020) and the Jaxline training framework. |
| Open Datasets | Yes | We compare µ-Eigen Game against α-Eigen Game, GHA (Sanger, 1989), Matrix Krasulina (Tang, 2019), and Oja s algorithm (Allen-Zhu and Li, 2017) on the MNIST dataset. ... The dataset consists a subset of the 40 billion words used to train the transformer-based Meena language model (Adiwardana et al., 2020). ... The Facebook graph consists of 134, 833 nodes, 1, 380, 293 edges, and 8 connected components... (Leskovec and Krevl, 2014; Rozemberczki et al., 2019). |
| Dataset Splits | No | For MNIST, it states 'Learning rates were chosen from {10 3, . . . , 10 6} on 10 held out runs,' which implies hyperparameter tuning, but it does not specify explicit training/validation/test dataset split percentages or sample counts. It refers to a 'training set' but provides no details on how it was split. |
| Hardware Specification | Yes | Specifically we consider the parallel framework specified by TPUv3 available in Google Cloud... We use minibatches of size 4,096 in each TPU. We do model parallelism across 4 TPUs... The experiment was run on a single CPU. |
| Software Dependencies | No | The paper mentions using 'Optax optimization library' and 'Jaxline training framework' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use minibatches of size 4,096 in each TPU. We compute and apply updates using SGD with a learning rate of 5 10 5 and Nesterov momentum with a factor of 0.9. ... Learning rates were chosen from {10 3, . . . , 10 6} on 10 held out runs. |