reproducibilityindex.ai

Goal-Conditioned On-Policy Reinforcement Learning

Authors: Xudong Gong, Feng Dawei, Kele Xu, Bo Ding, Huaimin Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that GCPO is capable of effectively addressing both multi-goal MR and NMR problems.
Researcher Affiliation	Academia	1 College of Computer, National University of Defense Technology, Changsha, Hunan, China 2 State Key Laboratory of Complex & Critical Software Environment, Changsha, Hunan, China
Pseudocode	Yes	The overall GCPO framework is depicted in Fig. 2 and a practical implementation of GCPO is detailed in Algorithm 1.
Open Source Code	Yes	Justification: Please refer to https://github.com/Gong Xudong/GCPO.
Open Datasets	Yes	Experiments are conducted on the Fixed-Wing UAV Velocity Vector Control (VVC) task [26], which is a representative multi-goal problem. [...] The demonstrations for Point Maze are sourced from Minari [18] (pointmaze-large-v1), while the demonstrations for Reach are generated by us, with reference to the PID controller as described in the official documentation [44].
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into explicit train/validation/test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The Imitation framework is utilized to implement BC algorithm... and the Stable Baselines3 framework for PPO... (without specific version numbers).
Experiment Setup	Yes	Table 6: Parameters used in BC (a) and PPO (b) [listing specific hyperparameters like `batch_size`, `epochs`, `lr`, etc.].