Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hindsight Trust Region Policy Optimization
Authors: Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | HTRPO has been evaluated on various sparse-reward tasks, including Atari games and simulated robot control. Results show that HTRPO consistently outperforms TRPO, as well as HPG, a state-of-the-art policy gradient algorithm for RL with sparse rewards. |
| Researcher Affiliation | Academia | 1Xi an Jiaotong University 2National University of Singapore EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The complete algorithm of HGF and HTRPO is presented in Appendix E. |
| Open Source Code | No | The paper references third-party baselines (Open AI baselines) but does not provide a specific link or explicit statement for the open-source code of their proposed HTRPO method. |
| Open Datasets | Yes | Firstly, we test HTRPO in simple benchmarks established in previous work [Andrychowicz et al., 2017] including 4-to100-Bit Flipping tasks. Secondly, We verify HTRPO s performance in Atari games like Ms. Pac-Man [Bellemare et al., 2013] with complex raw image input... Finally, we test HTRPO in simulated robot control tasks like Reach, Push, Slide and Pick And Place in Fetch [Plappert et al., 2018] robot environment. |
| Dataset Splits | No | The paper mentions using benchmark environments but does not explicitly describe validation data splits or usage. |
| Hardware Specification | Yes | All experiments are conducted on a platform with NVIDIA Ge Force GTX 1080Ti. |
| Software Dependencies | No | The paper mentions using DQN and DDPG based on Open AI baselines but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Detailed settings of hyperparameters are listed in Appendix F.2. |