Feasibility Consistent Representation Learning for Safe Reinforcement Learning
Authors: Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines. |
| Researcher Affiliation | Academia | Zhepeng Cen 1 Yihang Yao 1 Zuxin Liu 1 Ding Zhao 1 1 Carnegie Mellon University. Correspondence to: Zhepeng Cen <zcen@andrew.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Feasibility Consistent Safe RL |
| Open Source Code | No | The project website is available at https://sites.google. com/view/FCSRL. This is a project website, not a direct link to a code repository or an explicit statement of code release within the paper's text. |
| Open Datasets | Yes | To answer the above questions, we use 6 vector-state and 3 image-based continuous robotic control tasks as our testbeds adopted from safety-gymnasium (Ji et al., 2023), a widely used evaluation benchmark by previous work for safe RL (Liu et al., 2023). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit mention of validation set usage for hyperparameter tuning) beyond referring to environment steps and training curves. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models, memory specifications, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions "Adam" as an optimizer but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | Table 6: The hyperparameters adopted in experiments. It lists specific values like "NN learning rate 3e-4", "discount factor γ 0.99", "prediction length K 4", and "PID cofficient for Lagrangian [Kp, Ki, Kd] [0.02, 0.005, 0.01]". |