reproducibilityindex.ai

Large Language Models Play StarCraft II:Benchmarks and A Chain of Summarization Approach

Authors: Weiyu Ma, Qirui Mi, Yongcheng Zeng, Xue Yan, Runji Lin, Yuqiao Wu, Jun Wang, Haifeng Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiment, we detail the setup and key metrics (evaluation metrics detailed in Appendix A.2.) to evaluate macro-strategic decision-making in Star Craft II. We assess the Chain of Summarization s impact on LLM gameplay, compare various LLMs performance, and evaluate their grasp of Star Craft II strategies. Our experiments also concludes with human-AI interaction tests.
Researcher Affiliation	Academia	Weiyu Ma1,2, Qirui Mi1,2, Yongcheng Zeng1,2, Xue Yan1,2, Yuqiao Wu1,2, Runji Lin1,2, Haifeng Zhang1,2,4, Jun Wang3 1 Institute of Automation, Chinese Academy of Sciences, China 2 School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, China 3 AI Centre, Department of Computer Science, UCL 4 Nanjing Artiﬁcial Intelligence Research of IA, China
Pseudocode	Yes	The pseudocode is as shown in Algorithm 1.
Open Source Code	Yes	All code and data from this study have been made pulicly available at https://github.com/histmeisah/ Large-Language-Models-play-Star Craft II
Open Datasets	Yes	All code and data from this study have been made pulicly available at https://github.com/histmeisah/ Large-Language-Models-play-Star Craft II
Dataset Splits	No	The paper mentions 'Training Phase' and 'Testing Phase' and that 'we ﬁne-tuned open-source models using the entire dataset of GPT3.5-turbo-16k interaction logs with Text Star Craft II', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	Training Phase: To ﬁne-tune open-source Large Language Models, we utilized two NVIDIA A100 40GB GPUs... Testing Phase: The development and testing of these ﬁne-tuned models were performed using two NVIDIA A100 40GB GPUs. For running the Star Craft II environment, we employed an NVIDIA Ge Force RTX 3060 GPU paired with a 13th Gen Intel(R) Core(TM) i5-13400F processor operating at 2.50 GHz
Software Dependencies	No	The paper mentions software components like 'python-sc2 framework', 'GPT3.5-turbo-16k', 'Llama2 70B', 'Chat GLM3 6B', 'Qwen 1.8B', and 'sc2reader library', but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Agent and Opponent Selection: To ensure a consistent and controlled experimental environment, LLM agents are conﬁgured to play as Protoss against Zerg AI opponents. ... Map Selection: For our experiments, we selected the Altitude LE and Ancient Cistern LE maps... Parameter Settings: The temperature parameter is ﬁxed at 0.1 to focus on strategy-driven actions over randomness. Game Version: Our experiments were conducted across three different versions of the game to ensure robustness and applicability of the results. The versions tested include Patch 5.0.11, Patch 5.0.12, and Patch 5.0.13.