reproducibilityindex.ai

Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints

Authors: Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang, Chenghu Zhou

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted, showing that EPO outperforms the baselines in terms of policy performance and constraint satisfaction with a stable training process, particularly on complex tasks.
Researcher Affiliation	Academia	Shiqing Gao , Jiaxin Ding , Luoyi Fu , Xinbing Wang and Chenghu Zhou Shanghai Jiao Tong University
Pseudocode	Yes	Algorithm 1 EPO: Exterior Penalty Policy Optimization
Open Source Code	No	The paper does not contain an explicit statement offering open-source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	We train different agents and design comparison experiments in four navigation tasks based on Safety Gymnasium [Brockman et al., 2016] and four Mu Jo Co physical simulator tasks [Todorov et al., 2012].
Dataset Splits	No	The paper mentions 'training steps' but does not specify exact training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'PPO' and 'Mu Jo Co' environments, but it does not specify concrete version numbers for any software dependencies.
Experiment Setup	No	Algorithm 1 lists hyperparameters that need to be set (e.g., 'PPO clip rate, µ, α for penalty function and learning rate η'), but the paper does not provide the specific numerical values for these hyperparameters or other concrete details about the experimental setup in the main text.