reproducibilityindex.ai

Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data

Authors: Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Yihang Yao, Hanjiang Hu, Ding Zhao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple robot platforms show the efficiency of SAFER in learning a robust and safe policy: achieving the same reward with much fewer constraint violations during training than on-policy baselines. and 5. Experiment We consider two tasks (Run and Circle) and four robots (Ball, Car, Drone, and Ant) which have been used in many previous works as the testing ground (Achiam et al., 2017; Chow et al., 2019).
Researcher Affiliation	Academia	1Carnegie Mellon University, PA, USA.
Pseudocode	Yes	Algo. 1 highlights the key steps of training the policy. (referring to Algorithm 1 SAFER Algorithm) and Algorithm 2 SAFER Algorithm and Algorithm 3 MC and MR attacker and Algorithm 4 SA-PPO-Lagrangian Algorithm and Algorithm 5 ADV-PPOL Algorithm and Algorithm 6 CVPO Algorithm.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It only mentions 'Video demos can be found in our website: https://sites.google.com/view/saferrl/home.' which is for demos, not code.
Open Datasets	Yes	The simulation environments are from a publicly available benchmark (Gronauer, 2022).
Dataset Splits	No	The paper describes using a replay buffer for sampling transitions during training but does not provide specific training/validation/test dataset splits (percentages, sample counts, or references to predefined splits) in the way a supervised learning paper might.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup	Yes	B.3. Experiment Setting and Hyper-parameters and Table 4: Hyperparameters for on-policy baselines (left) and off-policy baselines (right). (This table lists various specific hyperparameters like training epoch, batch size, particle size, M-step iterations, cost limit, perturbation ϵ, KL thresholds, learning rates, etc.)