Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Authors: Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework. |
| Researcher Affiliation | Academia | Xuanfa Jin ,1,3, Ziyan Wang ,2, Yali Du ,2, Meng Fang4, Haifeng Zhang ,1,3,5, Jun Wang ,6 1Institute of Automation, Chinese Academy of Sciences, 2 Cooperative AI Lab, Department of Informatics, King s College London, 3School of Artificial Intelligence, University of Chinese Academy of Sciences, 4University of Liverpool, 5Nanjing Artificial Intelligence Research of IA, 6AI Centre, Department of Computer Science, UCL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The project page of our paper: one-night-ultimate-werewolf.github.io. |
| Open Datasets | No | Given the scarcity of datasets containing human players engaged in the ONUW game, we opt to leverage game logs generated by LLMs, which is the most effective way to collect trajectories for offline RL training. |
| Dataset Splits | No | The paper collects game logs for training, but does not explicitly provide training/validation/test dataset splits with specific percentages or counts. |
| Hardware Specification | Yes | The training of the discussion policy takes an NVIDIA Ge Force RTX 3060 Ti GPU for about 2.5 hours. |
| Software Dependencies | No | text-embedding-ada-002 3 is adopted to get the state embeddings and we utilize the CQL [45] to train the discussion policy. More training details can be referred to Appendix E.2. We repeat the game 30 times and report the final results for each evaluation. 3https://platform.openai.com/docs/models 4https://ai.google.dev/models |
| Experiment Setup | Yes | The hyperparameters we used for CQL are listed in Table 3 (if not listed, use the default values). Table 3: Training hyperparameters for CQL. Hyper-parameters Value Learning rate 5e-5 Discount factor (γ) 0.99 Mini-batch size 32 Trade-off factor (ρ) 4.0 Critic num 2 Target critic update interval 1000 Epoch num 100 Step num per epoch 5000 State dim 1536 Action dim 6 |