reproducibilityindex.ai

Hindsight Trust Region Policy Optimization

Authors: Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	HTRPO has been evaluated on various sparse-reward tasks, including Atari games and simulated robot control. Results show that HTRPO consistently outperforms TRPO, as well as HPG, a state-of-the-art policy gradient algorithm for RL with sparse rewards.
Researcher Affiliation	Academia	1Xi an Jiaotong University 2National University of Singapore {zhanghanbo163, best99317}@stu.xjtu.edu.cn, xglan@xjtu.edu.cn, dyhsu@comp.nus.edu.sg, nnzheng@xjtu.edu.cn
Pseudocode	Yes	The complete algorithm of HGF and HTRPO is presented in Appendix E.
Open Source Code	No	The paper references third-party baselines (Open AI baselines) but does not provide a specific link or explicit statement for the open-source code of their proposed HTRPO method.
Open Datasets	Yes	Firstly, we test HTRPO in simple benchmarks established in previous work [Andrychowicz et al., 2017] including 4-to100-Bit Flipping tasks. Secondly, We verify HTRPO s performance in Atari games like Ms. Pac-Man [Bellemare et al., 2013] with complex raw image input... Finally, we test HTRPO in simulated robot control tasks like Reach, Push, Slide and Pick And Place in Fetch [Plappert et al., 2018] robot environment.
Dataset Splits	No	The paper mentions using benchmark environments but does not explicitly describe validation data splits or usage.
Hardware Specification	Yes	All experiments are conducted on a platform with NVIDIA Ge Force GTX 1080Ti.
Software Dependencies	No	The paper mentions using DQN and DDPG based on Open AI baselines but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Detailed settings of hyperparameters are listed in Appendix F.2.