Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Authors: Chen Zhang, Qiang He, Yuan Zhou, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present the experimental results of HELT, using Naruto Mobile as the evaluation platform.
Researcher Affiliation Collaboration 1University of Science and Technology of China (USTC), Hefei, China. 2Tencent Games 3Institute of Automation, Chinese Academy of Sciences, China
Pseudocode No The paper describes algorithms but does not present them in a pseudocode block or explicitly labeled 'Algorithm' section.
Open Source Code No The paper does not explicitly state that the source code for their method is available or provide a link to a code repository.
Open Datasets No The paper describes using gameplay data from Naruto Mobile, stating 'we collected a significant amount of player gameplay data' and 'a sample of 40 million game matches from player gameplay data'. However, it does not provide concrete access information (link, DOI, formal citation for a public dataset) for this data.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It discusses a 'training subset' but does not define validation splits.
Hardware Specification Yes In our experimental setup, all agents were trained using 4 NVIDIA T4 GPUs and 3000 CPU cores. The league training consisted of a main agent, a main exploiter, and a league exploiter. A total of 12 GPUs and 9000 CPU cores were utilized for each league training session.
Software Dependencies No The paper mentions using the PPO algorithm and provides experimental parameters in Table 2, but it does not list specific software dependencies with version numbers (e.g., programming languages, deep learning frameworks, or libraries).
Experiment Setup Yes In our experimental setup, all agents were trained using 4 NVIDIA T4 GPUs and 3000 CPU cores. ... Table 2. Eperimental Parameters: n-steps 100, Batch size 5120, γ 0.995, λ 0.95, Learning rate 2e-4, Actor number 1000, Env number per actor 10, Learner number 2, CPU core num 9000, GPU per Learner 0.5.