Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Feasibility Consistent Representation Learning for Safe Reinforcement Learning
Authors: Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines. |
| Researcher Affiliation | Academia | Zhepeng Cen 1 Yihang Yao 1 Zuxin Liu 1 Ding Zhao 1 1 Carnegie Mellon University. Correspondence to: Zhepeng Cen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Feasibility Consistent Safe RL |
| Open Source Code | No | The project website is available at https://sites.google. com/view/FCSRL. This is a project website, not a direct link to a code repository or an explicit statement of code release within the paper's text. |
| Open Datasets | Yes | To answer the above questions, we use 6 vector-state and 3 image-based continuous robotic control tasks as our testbeds adopted from safety-gymnasium (Ji et al., 2023), a widely used evaluation benchmark by previous work for safe RL (Liu et al., 2023). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit mention of validation set usage for hyperparameter tuning) beyond referring to environment steps and training curves. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models, memory specifications, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions "Adam" as an optimizer but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | Table 6: The hyperparameters adopted in experiments. It lists specific values like "NN learning rate 3e-4", "discount factor γ 0.99", "prediction length K 4", and "PID cofficient for Lagrangian [Kp, Ki, Kd] [0.02, 0.005, 0.01]". |