Scaling Laws for a Multi-Agent Reinforcement Learning Model

Authors: Oren Neumann, Claudius Gros

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, Alpha Zero. On the basis of a relationship between Elo rating, playing strength and power-law scaling, we train Alpha Zero agents on the games Connect Four and Pentago and analyze their performance. We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute, and as a power of compute when training optimally sized agents. We observe nearly identical scaling exponents for both games. ... All code and data used in our experiments are available online
Researcher Affiliation Academia Oren Neumann & Claudius Gros Institute for Theoretical Physics Goethe University Frankfurt Frankfurt am Main, Germany {neumann,gros}@itp.uni-frankfurt.de
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes All code and data used in our experiments are available online 1 https://github.com/Oren Neumann/Alpha Zero-scaling-laws
Open Datasets No The paper states 'Focusing on Alpha Zero-agents that are guided by neural nets with fully connected layers, we test our hypothesis on two popular board games: Connect Four and Pentago.' It also mentions 'Training is done using the Alpha Zero Python implementation available in Open Spiel (Lanctot et al., 2019).' However, Alpha Zero agents generate their own training data through self-play, so there isn't a pre-existing 'dataset' in the conventional sense with access information provided.
Dataset Splits No The paper mentions 'We run matches with all trained agents, in order to calculate the Elo ratings...' and discusses 'training runs' but does not define explicit train/validation/test splits for a dataset.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It mentions 'available compute' and 'floating-point operations (FLOPs)' as a metric but not the hardware itself.
Software Dependencies No The paper mentions 'Training is done using the Alpha Zero Python implementation available in Open Spiel (Lanctot et al., 2019).' While it names a software (Open Spiel), it does not provide a specific version number for Open Spiel, Python, or any other relevant libraries, which is required for reproducibility.
Experiment Setup Yes In order to keep our results as general as possible, we try to avoid hyperparameter tuning and choose to train all agents with the values suggested by Open Spiel s Python Alpha Zero example. Table 3 contains a summary of all hyperparameters used and their meaning. Table 3: Training hyperparameters used for Connect Four, Pentago and Oware. Hyperparameter Value Description cuct 2 ... Learning rate 0.001 ... Temperature drop varied