reproducibilityindex.ai

Gradient-Adaptive Pareto Optimization for Constrained Reinforcement Learning

Authors: Zixian Zhou, Mengda Huang, Feiyang Pan, Jia He, Xiang Ao, Dandan Tu, Qing He

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Safety Gym benchmarks show that our method consistently outperforms previous CRL methods in reward while satisfying the constraints.
Researcher Affiliation	Collaboration	1 Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2 University of Chinese Academy of Sciences, Beijing 100049, China 3 Huawei EI Innovation Lab
Pseudocode	Yes	We introduce how GCPO trains an agent and modifies t in a pseudo-code, which can be found in Appendix E.
Open Source Code	No	The paper does not provide an explicit statement about the release of open-source code or a link to a code repository for their method.
Open Datasets	Yes	To validate our proposed algorithm, we conduct experiments on Safety Gym (Ray, Achiam, and Amodei 2019), a CRL benchmark.
Dataset Splits	No	The paper mentions '5 runs (1000 episodes each, 10000 steps each episode) with different random seeds' and 'test runs' but does not specify a separate validation dataset split.
Hardware Specification	No	The paper states that experiments are conducted in the Safety Gym environment simulated in Mujoco, but it does not specify any details about the hardware (e.g., CPU, GPU, memory) used for these simulations or training.
Software Dependencies	No	The paper mentions using 'Actor-critic-based PPO (Schulman et al. 2015) as the base model of GCPO' and 'Mujoco (Todorov, Erez, and Tassa 2012)', but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For reproducibility, a detailed statement about the architectures and hyper-parameters is presented in Appendix F.