Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Transfer of Safety-Constrained Reinforcement Learning Agents

Authors: Markel Zubia, Thiago Simão, Nils Jansen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluation shows that this method yields policies that are robust against changes in dynamics, demonstrating safety after transfer to a new environment.
Researcher Affiliation	Academia	1Ruhr Univesity Bochum, Germany 2Eindhoven University of Technology, The Netherlands 3Radboud University Nijmegen, The Netherlands
Pseudocode	No	The paper describes the methodology in Section 5 ('ROBUST GUIDED SAFE EXPLORATION') using natural language without presenting any formal pseudocode or algorithm blocks.
Open Source Code	Yes	1The source code is available on https://github.com/ai-fm/safe-and-robust-transfer
Open Datasets	Yes	We evaluate our method 1 on benchmark environments created using a framework for safe reinforcement learning called Safety-Gymnasium (Ji et al., 2023).
Dataset Splits	Yes	We restrict the uncertainty set to a finite subset ( U ) by discretizing the values of the parameters to m = m1, . . . , m N, and η = η1, . . . , ηN. In our experiments, we use N = 8 values for each parameter by letting mi = (0.5 + i 1 7 )m and ηi = (0.5 + i 1 7 )η for i = 1, . . . , 8, where m and η correspond to the dynamics in the source task.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions 'Safety-Gymnasium' as a framework for benchmark environments but does not provide specific version numbers for it or any other software libraries or dependencies used.
Experiment Setup	Yes	A HYPERPARAMETERS The hyperparameters in our method are summarized in Table 1. All actor and critic networks are modeled by a multilayer perceptron (MLP). Parameter M1 M2 M3 Actor network size [256, 256] [256, 256] [256, 256] Critic network size [256, 256] [256, 256] [256, 256] Size of replay buffer 106 106 106 Batch size 256 256 256 Steps per epoch 2000 2000 2000 Number of epochs 106 106 106 Actor learning rate 5 10 6 5 10 6 5 10 6 Critic learning rate 10 3 10 3 10 3 Lambda learning rate 5 10 7 5 10 7 5 10 7 Safety constraint 5 8 25 Table 1: The hyperparameters used in the experiments.