Scaling Laws for a Multi-Agent Reinforcement Learning Model
Authors: Oren Neumann, Claudius Gros
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, Alpha Zero. On the basis of a relationship between Elo rating, playing strength and power-law scaling, we train Alpha Zero agents on the games Connect Four and Pentago and analyze their performance. We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute, and as a power of compute when training optimally sized agents. We observe nearly identical scaling exponents for both games. ... All code and data used in our experiments are available online |
| Researcher Affiliation | Academia | Oren Neumann & Claudius Gros Institute for Theoretical Physics Goethe University Frankfurt Frankfurt am Main, Germany {neumann,gros}@itp.uni-frankfurt.de |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code and data used in our experiments are available online 1 https://github.com/Oren Neumann/Alpha Zero-scaling-laws |
| Open Datasets | No | The paper states 'Focusing on Alpha Zero-agents that are guided by neural nets with fully connected layers, we test our hypothesis on two popular board games: Connect Four and Pentago.' It also mentions 'Training is done using the Alpha Zero Python implementation available in Open Spiel (Lanctot et al., 2019).' However, Alpha Zero agents generate their own training data through self-play, so there isn't a pre-existing 'dataset' in the conventional sense with access information provided. |
| Dataset Splits | No | The paper mentions 'We run matches with all trained agents, in order to calculate the Elo ratings...' and discusses 'training runs' but does not define explicit train/validation/test splits for a dataset. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It mentions 'available compute' and 'floating-point operations (FLOPs)' as a metric but not the hardware itself. |
| Software Dependencies | No | The paper mentions 'Training is done using the Alpha Zero Python implementation available in Open Spiel (Lanctot et al., 2019).' While it names a software (Open Spiel), it does not provide a specific version number for Open Spiel, Python, or any other relevant libraries, which is required for reproducibility. |
| Experiment Setup | Yes | In order to keep our results as general as possible, we try to avoid hyperparameter tuning and choose to train all agents with the values suggested by Open Spiel s Python Alpha Zero example. Table 3 contains a summary of all hyperparameters used and their meaning. Table 3: Training hyperparameters used for Connect Four, Pentago and Oware. Hyperparameter Value Description cuct 2 ... Learning rate 0.001 ... Temperature drop varied |