Reward Penalties on Augmented States for Solving Richly Constrained RL Effectively
Authors: Hao Jiang, Tien Mai, Pradeep Varakantham, Huy Hoang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results We experimentally answer the following questions with regards to our approaches : |
| Researcher Affiliation | Academia | Singapore Management University haojiang.2021@phdcs.smu.edu.sg, atmai@smu.edu.sg, pradeepv@smu.edu.sg, mhhoang@smu.edu.sg |
| Pseudocode | Yes | The pseudo code for the Safe DQN algorithm is provided in the appendix. [...] The detailed pseudocode for Safe SAC is provided in the appendix. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for their methodology is openly available. |
| Open Datasets | Yes | For a discrete state and discrete action environment, we consider the stochastic 2D grid world problem introduced in previous CMDP works (Leike et al. 2017; Chow et al. 2018; Satija, Amortila, and Pineau 2020; Jain, Khetarpal, and Precup 2021). [...] Next, we consider the highway environment [...] (Leurent 2018). [...] We then compare Safe SAC with recent safe methods for continuous action spaces on the two environments Safety Point Goal1-v0, Safety Car Goal1-v0 from Safety Gymnasium (Ji et al. 2023). |
| Dataset Splits | Yes | The performance values (expected cost and expected reward) along with the standard deviation in each experiment are averaged over 5 runs. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | We set the expected cost threshold, cmax = 2, meaning agent could pass at most one pit. [...] We set the cmax = 8. [...] We set cmax = 15. [...] we conduct experiments on Grid World using Safe SAC, with λ1 = 1, λ2 = 5λ1, λ3 = 10λ1, a small λ4 = 0.001 and λ5 = 0 |