Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Authors: Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on a challenging decentralised variant of Star Craft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL. |
| Researcher Affiliation | Collaboration | 1University of Oxford, Oxford, United Kingdom 2Microsoft Research, Redmond, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using Torch Craft, but does not state that the code for the methodology described in this paper is open-source or provide a link to it. |
| Open Datasets | No | The paper uses the Star Craft unit micromanagement environment but does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes training on sampled episodes from an experience replay memory but does not provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Torch7' and 'Torch Craft' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We linearly anneal ϵ from 1.0 to 0.02 over 1500 episodes, and train the network for emax = 2500 training episodes. In the standard training loop, we collect a single episode and add it to the replay memory at each training step. We sample batches of 30 n episodes uniformly from the replay memory and train on fully unrolled episodes. In order to reduce the variance of the multi-agent importance weights, we clip them to the interval [0.01, 2]. |