Goal-Conditioned On-Policy Reinforcement Learning
Authors: Xudong Gong, Feng Dawei, Kele Xu, Bo Ding, Huaimin Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that GCPO is capable of effectively addressing both multi-goal MR and NMR problems. |
| Researcher Affiliation | Academia | 1 College of Computer, National University of Defense Technology, Changsha, Hunan, China 2 State Key Laboratory of Complex & Critical Software Environment, Changsha, Hunan, China |
| Pseudocode | Yes | The overall GCPO framework is depicted in Fig. 2 and a practical implementation of GCPO is detailed in Algorithm 1. |
| Open Source Code | Yes | Justification: Please refer to https://github.com/Gong Xudong/GCPO. |
| Open Datasets | Yes | Experiments are conducted on the Fixed-Wing UAV Velocity Vector Control (VVC) task [26], which is a representative multi-goal problem. [...] The demonstrations for Point Maze are sourced from Minari [18] (pointmaze-large-v1), while the demonstrations for Reach are generated by us, with reference to the PID controller as described in the official documentation [44]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into explicit train/validation/test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The Imitation framework is utilized to implement BC algorithm... and the Stable Baselines3 framework for PPO... (without specific version numbers). |
| Experiment Setup | Yes | Table 6: Parameters used in BC (a) and PPO (b) [listing specific hyperparameters like `batch_size`, `epochs`, `lr`, etc.]. |