Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints
Authors: Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang, Chenghu Zhou
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted, showing that EPO outperforms the baselines in terms of policy performance and constraint satisfaction with a stable training process, particularly on complex tasks. |
| Researcher Affiliation | Academia | Shiqing Gao , Jiaxin Ding , Luoyi Fu , Xinbing Wang and Chenghu Zhou Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1 EPO: Exterior Penalty Policy Optimization |
| Open Source Code | No | The paper does not contain an explicit statement offering open-source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We train different agents and design comparison experiments in four navigation tasks based on Safety Gymnasium [Brockman et al., 2016] and four Mu Jo Co physical simulator tasks [Todorov et al., 2012]. |
| Dataset Splits | No | The paper mentions 'training steps' but does not specify exact training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'PPO' and 'Mu Jo Co' environments, but it does not specify concrete version numbers for any software dependencies. |
| Experiment Setup | No | Algorithm 1 lists hyperparameters that need to be set (e.g., 'PPO clip rate, µ, α for penalty function and learning rate η'), but the paper does not provide the specific numerical values for these hyperparameters or other concrete details about the experimental setup in the main text. |