Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Goal-Conditioned On-Policy Reinforcement Learning

Authors: Xudong Gong, Feng Dawei, Kele Xu, Bo Ding, Huaimin Wang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that GCPO is capable of effectively addressing both multi-goal MR and NMR problems.
Researcher Affiliation Academia 1 College of Computer, National University of Defense Technology, Changsha, Hunan, China 2 State Key Laboratory of Complex & Critical Software Environment, Changsha, Hunan, China
Pseudocode Yes The overall GCPO framework is depicted in Fig. 2 and a practical implementation of GCPO is detailed in Algorithm 1.
Open Source Code Yes Justification: Please refer to https://github.com/Gong Xudong/GCPO.
Open Datasets Yes Experiments are conducted on the Fixed-Wing UAV Velocity Vector Control (VVC) task [26], which is a representative multi-goal problem. [...] The demonstrations for Point Maze are sourced from Minari [18] (pointmaze-large-v1), while the demonstrations for Reach are generated by us, with reference to the PID controller as described in the official documentation [44].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into explicit train/validation/test sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The Imitation framework is utilized to implement BC algorithm... and the Stable Baselines3 framework for PPO... (without specific version numbers).
Experiment Setup Yes Table 6: Parameters used in BC (a) and PPO (b) [listing specific hyperparameters like `batch_size`, `epochs`, `lr`, etc.].