Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment
Authors: Chen Zhang, Qiang He, Yuan Zhou, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the experimental results of HELT, using Naruto Mobile as the evaluation platform. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China (USTC), Hefei, China. 2Tencent Games 3Institute of Automation, Chinese Academy of Sciences, China |
| Pseudocode | No | The paper describes algorithms but does not present them in a pseudocode block or explicitly labeled 'Algorithm' section. |
| Open Source Code | No | The paper does not explicitly state that the source code for their method is available or provide a link to a code repository. |
| Open Datasets | No | The paper describes using gameplay data from Naruto Mobile, stating 'we collected a significant amount of player gameplay data' and 'a sample of 40 million game matches from player gameplay data'. However, it does not provide concrete access information (link, DOI, formal citation for a public dataset) for this data. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It discusses a 'training subset' but does not define validation splits. |
| Hardware Specification | Yes | In our experimental setup, all agents were trained using 4 NVIDIA T4 GPUs and 3000 CPU cores. The league training consisted of a main agent, a main exploiter, and a league exploiter. A total of 12 GPUs and 9000 CPU cores were utilized for each league training session. |
| Software Dependencies | No | The paper mentions using the PPO algorithm and provides experimental parameters in Table 2, but it does not list specific software dependencies with version numbers (e.g., programming languages, deep learning frameworks, or libraries). |
| Experiment Setup | Yes | In our experimental setup, all agents were trained using 4 NVIDIA T4 GPUs and 3000 CPU cores. ... Table 2. Eperimental Parameters: n-steps 100, Batch size 5120, γ 0.995, λ 0.95, Learning rate 2e-4, Actor number 1000, Env number per actor 10, Learner number 2, CPU core num 9000, GPU per Learner 0.5. |