Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Policies via Search in Cooperative Partially Observable Games
Authors: Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown7187-7194
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25. |
| Researcher Affiliation | Industry | Adam Lerer Facebook AI Research EMAIL Hengyuan Hu Facebook AI Research EMAIL Jakob Foerster Facebook AI Research EMAIL Noam Brown Facebook AI Research EMAIL |
| Pseudocode | No | A precise description of the algorithm is provided in this paper’s extended version. The provided paper text does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide code for single- and multi-agent search in Hanabi as well as a link to supplementary material at https://github.com/facebookresearch/Hanabi_SPARTA |
| Open Datasets | Yes | We evaluate our methods in the partially observable, fully cooperative game Hanabi, which at a high level resembles a cooperative extension of solitaire. Hanabi has recently been proposed as a new frontier for AI research (Bard et al. 2019) |
| Dataset Splits | No | The paper describes training an RL blueprint in a game environment ('train in self-play') rather than using static datasets with explicit train/validation/test splits. No specific dataset split information is provided. |
| Hardware Specification | Yes | All experiments except the imitation learning of Clone Bot and the reinforcement learning of RLBot were conducted on CPU using machines with Intel R Xeon R E5-2698 CPUs containing 40 cores each. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | After a minimum of 100 rollouts per action is performed... If the expected value for an action is not within 2 standard deviations of the expected value of the best action, its future MC rollouts are skipped. Furthermore, we use a configurable threshold for deviating from the blueprint action... We use a threshold of 0.05 in our experiments. |