Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning

Authors: Shihong Song, Jiayi Weng, Hang Su, Dong Yan, Haosheng Zou, Jun Zhu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our trained bot significantly outperforms the alternative RL-based models on FPS games requiring maze solving and combat skills, etc.
Researcher Affiliation Academia Shihong Song , Jiayi Weng , Hang Su , Dong Yan , Haosheng Zou and Jun Zhu Institute for AI, Tsinghua University Department of Computer Science and Technology, Tsinghua University Beijing National Research Center for Information Science and Technology Tsinghua Laboratory of Brain and Intelligence Lab Center for Intelligent Connected Vehicles and Transportation, Tsinghua University {songsh15,wengjy16,zouhs16}@mails.tsinghua.edu.cn; {suhangss,dcszj}@tsinghua.edu.cn; sproblvem@gmail.com
Pseudocode Yes Algorithm 1 Star Net Training
Open Source Code Yes Video demos and further experimental details can be found at https://github.com/Trinkle23897/Vi ZDoom2018-Track1.
Open Datasets No We use Py Oblige to generate 100 maps at five different difficulty levels4, yielding 20 maps per level as our training set. The paper states they generated their own maps for training but does not provide public access information (link, citation) for these specific generated maps.
Dataset Splits No The paper mentions generating maps for training and evaluation, but does not explicitly describe a separate validation split or dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The paper mentions 'Py Oblige3 map generator' and 'Yolov3' but does not specify their version numbers. It also mentions A2C and SNAIL but no library versions.
Experiment Setup Yes In each experiment, the learning rate for the manager network is 10 3 with batch-size 32. The motion worker adopts the SNAIL [Mishra et al., 2018] network architecture, with learning rate 10 2 and batch size 32. All the other workers architectures are the same as the manager.