HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation
Authors: Boyan Li, Hongyao Tang, YAN ZHENG, Jianye HAO, Pengyi Li, Zhen Wang, Zhaopeng Meng, LI Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Hy AR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of Hy AR when compared with previous baselines, especially for high-dimensional action spaces. |
| Researcher Affiliation | Academia | 1College of Intelligence and Computing, Tianjin University, 2School of Artificial Intelligence, Optics and Electronics (i OPEN) and School of Cybersecurity, Northwestern Polytechnical University |
| Pseudocode | Yes | Algorithm 1 describes the pseudo-code of Hy AR-TD3, containing two major stages: 1 warm-up stage and 2 training stage. |
| Open Source Code | Yes | For reproducibility, codes are provided in the supplementary material. |
| Open Datasets | Yes | Benchmarks Fig. 4 visualizes the evaluation benchmarks, including the Platform and Goal from (Masson et al., 2016), Catch Point from (Fan et al., 2019), and a newly designed Hard Move speciļ¬c to the evaluation in larger hybrid action space. We also build a complex version of Goal, called Hard Goal. All benchmarks have hybrid actions and require the agent to select reasonable actions to complete the task. See complete description of benchmarks in Appendix B.1. |
| Dataset Splits | No | The paper describes a "warm-up stage" for pre-training representation models and a "training stage" for policy learning. It does not mention explicit validation set splits (e.g., 80/10/10) or usage for hyperparameter tuning in the conventional sense. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA GeForce GTX 2080Ti GPU. |
| Software Dependencies | Yes | Our codes are implemented with Python 3.7.9 and Torch 1.7.1. |
| Experiment Setup | Yes | Complete details of setups are provided in Appendix B. For all experiments, we give each baseline the same training budget. For our algorithms, we use a random strategy to interact with the environment for 5000 episodes during the warm-up stage. For each experiment, we run 5 trials and report the average results. Table 5 shows the common hyperparamters of algorithms used in all our experiments. |