Large Language Models Play StarCraft II:Benchmarks and A Chain of Summarization Approach
Authors: Weiyu Ma, Qirui Mi, Yongcheng Zeng, Xue Yan, Runji Lin, Yuqiao Wu, Jun Wang, Haifeng Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiment, we detail the setup and key metrics (evaluation metrics detailed in Appendix A.2.) to evaluate macro-strategic decision-making in Star Craft II. We assess the Chain of Summarization s impact on LLM gameplay, compare various LLMs performance, and evaluate their grasp of Star Craft II strategies. Our experiments also concludes with human-AI interaction tests. |
| Researcher Affiliation | Academia | Weiyu Ma1,2, Qirui Mi1,2, Yongcheng Zeng1,2, Xue Yan1,2, Yuqiao Wu1,2, Runji Lin1,2, Haifeng Zhang1,2,4, Jun Wang3 1 Institute of Automation, Chinese Academy of Sciences, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, China 3 AI Centre, Department of Computer Science, UCL 4 Nanjing Artificial Intelligence Research of IA, China |
| Pseudocode | Yes | The pseudocode is as shown in Algorithm 1. |
| Open Source Code | Yes | All code and data from this study have been made pulicly available at https://github.com/histmeisah/ Large-Language-Models-play-Star Craft II |
| Open Datasets | Yes | All code and data from this study have been made pulicly available at https://github.com/histmeisah/ Large-Language-Models-play-Star Craft II |
| Dataset Splits | No | The paper mentions 'Training Phase' and 'Testing Phase' and that 'we fine-tuned open-source models using the entire dataset of GPT3.5-turbo-16k interaction logs with Text Star Craft II', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | Training Phase: To fine-tune open-source Large Language Models, we utilized two NVIDIA A100 40GB GPUs... Testing Phase: The development and testing of these fine-tuned models were performed using two NVIDIA A100 40GB GPUs. For running the Star Craft II environment, we employed an NVIDIA Ge Force RTX 3060 GPU paired with a 13th Gen Intel(R) Core(TM) i5-13400F processor operating at 2.50 GHz |
| Software Dependencies | No | The paper mentions software components like 'python-sc2 framework', 'GPT3.5-turbo-16k', 'Llama2 70B', 'Chat GLM3 6B', 'Qwen 1.8B', and 'sc2reader library', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Agent and Opponent Selection: To ensure a consistent and controlled experimental environment, LLM agents are configured to play as Protoss against Zerg AI opponents. ... Map Selection: For our experiments, we selected the Altitude LE and Ancient Cistern LE maps... Parameter Settings: The temperature parameter is fixed at 0.1 to focus on strategy-driven actions over randomness. Game Version: Our experiments were conducted across three different versions of the game to ensure robustness and applicability of the results. The versions tested include Patch 5.0.11, Patch 5.0.12, and Patch 5.0.13. |