Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Authors: Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on several challenging tasks (simulated Robo Cup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.
Researcher Affiliation Collaboration Haotian Fu1 , Hongyao Tang1 , Jianye Hao1 , Zihan Lei2 , Yingfeng Chen2 , Changjie Fan2 1College of Intelligence and Computing, Tianjin University 2Fuxi AI Lab in Netease {haotianfu, bluecontra, jianye.hao}@tju.edu.cn, {leizihan, chenyingfeng1, fanchangjie}@corp.netease.com
Pseudocode No The paper describes the steps of the algorithms in paragraph form and through equations, but does not include a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper provides links to supplementary material regarding mixing network structure and experimental settings, and a video of learned policies, but does not explicitly state that the source code for their methodology is open-source or provide a link to it.
Open Datasets Yes In this section, we evaluate our algorithms in 1) the standard benchmark game HFO, 2) 3v3 mode in a large-scale online video game Ghost Story. Half field Offense (HFO) is an abstraction of full Robo Cup 2D game. Previous work [Hausknecht and Stone, 2016; Wang et al., 2018; Wei et al., 2018b] applied RL to the single-agent version of HFO...A full list of state information can be found at the official website https://github.com/mhauskn/ HFO/blob/master/doc/manual.pdf.
Dataset Splits No The paper discusses training and execution phases, but does not provide specific details on training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes the actual training time of Deep MAPQN is about three days while Deep MAHHQN takes less than one day to train on the same NVidia Geforce GTX 1080Ti GPU.
Software Dependencies No The paper does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers.
Experiment Setup No The paper describes reward functions and training coordination between high-level and low-level networks, but does not provide specific hyperparameters such as learning rate, batch size, or optimizer settings.