Hybrid Policy Optimization from Imperfect Demonstrations

Authors: Hanlin Yang, Chao Yu, peng sun, Siji Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that HYPO significantly outperforms several baselines in various challenging tasks, such as Mu Jo Co with sparse rewards, Google Research Football, and the Air Sim drone simulation.
Researcher Affiliation Collaboration Hanlin Yang Sun Yat-sen University Chao Yu Sun Yat-sen University Peng Sun Byte Dance Siji Chen Sun Yat-sen University
Pseudocode No The paper describes the components and objectives of the HYPO algorithm using text and mathematical equations, but it does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Code is available at https://github.com/joenghl/HYPO.
Open Datasets Yes We first perform an exhaustive evaluation of HYPO in Mu Jo Co (Todorov et al., 2012) with sparse rewards and Google Research Football (GRF) (Kurach et al., 2020) with huge policy space and only a sparse score reward. We also evaluate HYPO on an Unmanned Aerial Vehicle (UAV) 2 task based on the Unreal Engine and Air Sim (Shah et al., 2018) to show the effectiveness of HYPO in addressing more challenging control tasks with high-fidelity.
Dataset Splits No The paper mentions environments like Mu Jo Co, Google Research Football, and Air Sim but does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions various algorithms and frameworks (e.g., PPO, GAIL), but it does not specify any software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup No The paper states 'Refer to Appendix C for more experimental details.' in Section 5.1, indicating that specific experimental setup details, such as hyperparameters, are not provided in the main text.