Gradient-Adaptive Pareto Optimization for Constrained Reinforcement Learning
Authors: Zixian Zhou, Mengda Huang, Feiyang Pan, Jia He, Xiang Ao, Dandan Tu, Qing He
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Safety Gym benchmarks show that our method consistently outperforms previous CRL methods in reward while satisfying the constraints. |
| Researcher Affiliation | Collaboration | 1 Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2 University of Chinese Academy of Sciences, Beijing 100049, China 3 Huawei EI Innovation Lab |
| Pseudocode | Yes | We introduce how GCPO trains an agent and modifies t in a pseudo-code, which can be found in Appendix E. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of open-source code or a link to a code repository for their method. |
| Open Datasets | Yes | To validate our proposed algorithm, we conduct experiments on Safety Gym (Ray, Achiam, and Amodei 2019), a CRL benchmark. |
| Dataset Splits | No | The paper mentions '5 runs (1000 episodes each, 10000 steps each episode) with different random seeds' and 'test runs' but does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper states that experiments are conducted in the Safety Gym environment simulated in Mujoco, but it does not specify any details about the hardware (e.g., CPU, GPU, memory) used for these simulations or training. |
| Software Dependencies | No | The paper mentions using 'Actor-critic-based PPO (Schulman et al. 2015) as the base model of GCPO' and 'Mujoco (Todorov, Erez, and Tassa 2012)', but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For reproducibility, a detailed statement about the architectures and hyper-parameters is presented in Appendix F. |