Hindsight Trust Region Policy Optimization
Authors: Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | HTRPO has been evaluated on various sparse-reward tasks, including Atari games and simulated robot control. Results show that HTRPO consistently outperforms TRPO, as well as HPG, a state-of-the-art policy gradient algorithm for RL with sparse rewards. |
| Researcher Affiliation | Academia | 1Xi an Jiaotong University 2National University of Singapore {zhanghanbo163, best99317}@stu.xjtu.edu.cn, xglan@xjtu.edu.cn, dyhsu@comp.nus.edu.sg, nnzheng@xjtu.edu.cn |
| Pseudocode | Yes | The complete algorithm of HGF and HTRPO is presented in Appendix E. |
| Open Source Code | No | The paper references third-party baselines (Open AI baselines) but does not provide a specific link or explicit statement for the open-source code of their proposed HTRPO method. |
| Open Datasets | Yes | Firstly, we test HTRPO in simple benchmarks established in previous work [Andrychowicz et al., 2017] including 4-to100-Bit Flipping tasks. Secondly, We verify HTRPO s performance in Atari games like Ms. Pac-Man [Bellemare et al., 2013] with complex raw image input... Finally, we test HTRPO in simulated robot control tasks like Reach, Push, Slide and Pick And Place in Fetch [Plappert et al., 2018] robot environment. |
| Dataset Splits | No | The paper mentions using benchmark environments but does not explicitly describe validation data splits or usage. |
| Hardware Specification | Yes | All experiments are conducted on a platform with NVIDIA Ge Force GTX 1080Ti. |
| Software Dependencies | No | The paper mentions using DQN and DDPG based on Open AI baselines but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Detailed settings of hyperparameters are listed in Appendix F.2. |