GAVEL: Generating Games via Evolution and Language Models
Authors: Graham Todd, Alexander G Padula, Matthew Stephenson, Eric Piette, Dennis Soemers, Julian Togelius
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate both quantitatively and qualitatively that our approach is capable of generating new and interesting games, including in regions of the potential rules space not covered by existing games in the Ludii dataset. and We show empirically that GAVEL is capable of generating playable and interesting board games that differ substantially from games encountered during training. |
| Researcher Affiliation | Academia | Graham Todd New York University Tandon Brooklyn, New York, USA gdrtodd@nyu.edu Alexander G. Padula ETH Zurich Zurich, Switzerland apadula@ethz.ch Matthew Stephenson Flinders University Adelaide, Australia matthew.stephenson@flinders.edu.au Éric Piette UCLouvain Louvain-la-Neuve, Belgium eric.piette@uclouvain.be Dennis J.N.J. Soemers Maastricht University Maastricht, the Netherlands dennis.soemers@maastrichtuniversity.nl Julian Togelius New York University Tandon Brooklyn, New York, USA julian.togelius@nyu.edu |
| Pseudocode | Yes | Algorithm 1 GAVEL Game Evaluation |
| Open Source Code | Yes | 2Code and data available here: https://github.com/gdrtodd/gavel |
| Open Datasets | Yes | We construct our initial game dataset out of the 1182 existing games that have been translated into the Ludii game description language (available under a Creative Commons BY-NC-ND 4.0 license). and We provide a link to a public repository that includes our code and data, including a trained model checkpoint, as a footnote at the end of the introduction and here: https://github.com/gdrtodd/gavel |
| Dataset Splits | Yes | From this reduced dataset, we hold out a set of 14 varied games (available in Appendix A) that are used to initialize the evolutionary search, with the remaining 574 games being used as our training dataset. |
| Hardware Specification | Yes | Training took approximately 40 hours to complete on a single RTX8000 GPU. and Each run lasted roughly 48 hours using a single RTX8000 GPU for inference from the Code Llama-13b model and performing evaluations in parallel with 16 CPU cores and 128GB of total memory. |
| Software Dependencies | No | The paper mentions using 'Code Llama [52] (specifically Code Llama-13b),' 'parameter-efficient fine-tuning [39] and 8-bit quantization [22],' and the 'Pyribs library [60].' However, it does not specify version numbers for the programming language (e.g., Python), deep learning framework (e.g., PyTorch), or other core software dependencies needed for replication. |
| Experiment Setup | Yes | We fine-tune the model for a single epoch with hyperparameters available in Appendix B. and Appendix B lists: Number of epochs: 1, Batch size: 1, Sequence length: 1024, Optimizer: Adam W [37], Learning rate: 3e-4, Warmup Ratio: 0.03, Lo RA Alpha: 16, Lo RA Dropout: 0.05, Lo RA r: 64. Also, Section 5 states: For each run, we select j = 3 games and generate k = 3 mutations for each game at each step. and Section 4.2 states: We then sample from the trained Code Llama-13b model with a temperature of 1 and a top-k value of 50 to generate a new expression |