Goal-Conditioned On-Policy Reinforcement Learning

Authors: Xudong Gong, Feng Dawei, Kele Xu, Bo Ding, Huaimin Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that GCPO is capable of effectively addressing both multi-goal MR and NMR problems.
Researcher Affiliation Academia 1 College of Computer, National University of Defense Technology, Changsha, Hunan, China 2 State Key Laboratory of Complex & Critical Software Environment, Changsha, Hunan, China
Pseudocode Yes The overall GCPO framework is depicted in Fig. 2 and a practical implementation of GCPO is detailed in Algorithm 1.
Open Source Code Yes Justification: Please refer to https://github.com/Gong Xudong/GCPO.
Open Datasets Yes Experiments are conducted on the Fixed-Wing UAV Velocity Vector Control (VVC) task [26], which is a representative multi-goal problem. [...] The demonstrations for Point Maze are sourced from Minari [18] (pointmaze-large-v1), while the demonstrations for Reach are generated by us, with reference to the PID controller as described in the official documentation [44].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into explicit train/validation/test sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The Imitation framework is utilized to implement BC algorithm... and the Stable Baselines3 framework for PPO... (without specific version numbers).
Experiment Setup Yes Table 6: Parameters used in BC (a) and PPO (b) [listing specific hyperparameters like `batch_size`, `epochs`, `lr`, etc.].