reproducibilityindex.ai

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Authors: Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
Researcher Affiliation	Academia	Zhepeng Cen 1 Yihang Yao 1 Zuxin Liu 1 Ding Zhao 1 1 Carnegie Mellon University. Correspondence to: Zhepeng Cen <zcen@andrew.cmu.edu>.
Pseudocode	Yes	Algorithm 1 Feasibility Consistent Safe RL
Open Source Code	No	The project website is available at https://sites.google. com/view/FCSRL. This is a project website, not a direct link to a code repository or an explicit statement of code release within the paper's text.
Open Datasets	Yes	To answer the above questions, we use 6 vector-state and 3 image-based continuous robotic control tasks as our testbeds adopted from safety-gymnasium (Ji et al., 2023), a widely used evaluation benchmark by previous work for safe RL (Liu et al., 2023).
Dataset Splits	No	The paper does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit mention of validation set usage for hyperparameter tuning) beyond referring to environment steps and training curves.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU or CPU models, memory specifications, or cloud computing resources used for running the experiments.
Software Dependencies	No	The paper mentions "Adam" as an optimizer but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup	Yes	Table 6: The hyperparameters adopted in experiments. It lists specific values like "NN learning rate 3e-4", "discount factor γ 0.99", "prediction length K 4", and "PID cofficient for Lagrangian [Kp, Ki, Kd] [0.02, 0.005, 0.01]".