Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games
Authors: Siqi Liu, Marc Lanctot, Luke Marris, Nicolas Heess
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with simplex-Neu PL across two domains. First, we study the imperfect-information game of goofspiel... Second, we explore the partially-observed, spatiotemporal strategy game of running-with-scissors... |
| Researcher Affiliation | Collaboration | 1University College London, UK 2DeepMind, UK. Correspondence to: Siqi Liu <liusiqi@google.com>. |
| Pseudocode | Yes | Algorithm 1 Simplex Neural Population Learning; Algorithm 2 MGS implementing PSRO-NASH. |
| Open Source Code | No | The paper mentions and uses the Open Spiel library, providing its citation and links to specific components within it (e.g., 'open_spiel/python/algorithms/policy_aggregator.py'), but does not explicitly state that the code for the Simplex Neural Population Learning method itself is open-sourced or provide a link to its implementation. |
| Open Datasets | Yes | The specific implementation of the game is available as part of Open Spiel (Lanctot et al., 2019), instantiated with the following game string: goofspiel(imp_info=true, egocentric=True, num_cards=5, points_order=descending, returns_type=point_difference)); Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V., Upadhyay, S., P erolat, J., Srinivasan, S., Timbers, F., Tuyls, K., Omidshafiei, S., Hennes, D., Morrill, D., Muller, P., Ewalds, T., Faulkner, R., Kram ar, J., Vylder, B. D., Saeta, B., Bradbury, J., Ding, D., Borgeaud, S., Lai, M., Schrittwieser, J., Anthony, T., Hughes, E., Danihelka, I., and Ryan-Davis, J. Open Spiel: A framework for reinforcement learning in games. Co RR, abs/1908.09453, 2019. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits, as the data is generated dynamically through interaction with game environments rather than being loaded from a static dataset. |
| Hardware Specification | Yes | Across both domains, we used a single TPU-v2 both to perform gradient updates for neural population of policies and to serve their inference requests during simulation. The game simulation is then performed on 256 remote CPU actors for running-with-scissors and 128 for goofspiel. |
| Software Dependencies | No | The paper mentions using an MPO agent and the Open Spiel framework, but does not provide specific version numbers for the underlying software libraries such as deep learning frameworks (e.g., TensorFlow, PyTorch) or other key dependencies. |
| Experiment Setup | Yes | We used the same MPO agent (Abdolmaleki et al., 2018) as in Liu et al. (2022), with 20 action samples drawn and evaluated by the learned Q-function per state at each gradient update. The target (Q-value and policy) networks are updated every 100 gradient steps. The policy head and Q-value networks are parameterised by MLPs of (512, 256, 128, NUMACTIONS) and (512, 512, 128, 1) respectively with Elu activation. In goofspiel we invoke the meta-graph solver every 10,000 gradient updates... In running-with-scissors, we update the meta-graph every 1,000 gradient updates. Data are sampled uniformly from the replay server, with a maximum buffer size of 100,000 trajectories across both domains. |