reproducibilityindex.ai

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Authors: Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Yaodong Yang, Song-Chun Zhu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RLand LLM-based agents struggle to make substantial progress in the full game.
Researcher Affiliation	Collaboration	1National Key Laboratory of General Artificial Intelligence, BIGAI 2Peking University 3BUPT
Pseudocode	No	The paper describes the methods and network architectures in text and diagrams (e.g., Figure 5, Figure 11) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	The code is available at https://github.com/bigai-ai/civrealm.
Open Datasets	No	The paper describes the creation of mini-game "instances" and the use of "maps" for full games, which function as environments or tasks for training and evaluation. However, it does not provide concrete access information (e.g., specific links, DOIs, or formal citations) for a publicly available, conventionally split dataset (train/validation/test) used for the experiments.
Dataset Splits	No	The paper describes training models for a certain number of steps and on different mini-game instances or maps, but it does not specify any explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper mentions parallelizing tensor environments with Ray but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using "Proximal Policy Optimization (PPO)" and "Ray" for tensor-based RL, and "GPT3.5-turbo provided by Azure s Open AI API" for LLM experiments. However, it does not provide specific version numbers for any of these software components as required for a reproducible description.
Experiment Setup	Yes	We configured the actor update for 5 epochs, employing a clipped value loss with a clip parameter of 0.2, and using one mini-batch per epoch. The coefficients assigned to the entropy term and value loss were 0.01 and 0.001, respectively. The length of each episode was set at 125 steps, and we collected training data across 8 parallel environments. The learning rate for the Adam[41] optimizer was established at 0.0005, with an optimizer epsilon of 0.00001.