Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

Authors: Yiting He, Zhishuai Liu, Weixin Wang, Pan Xu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we validate our theoretical results through comprehensive numerical experiments. ... We conduct comprehensive numerical experiments to validate our theoretical findings. In a simulated MDP, we show that the performance of learned policies degrades as Cvr increases. We evaluate our algorithms in a simulated RMDP and the Frozen Lake environment, highlighting their effectiveness when distribution shifts are significant.
Researcher Affiliation	Academia	1Duke University. Correspondence to: Pan Xu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Online Robust Bellman Iteration (ORBIT) ... Algorithm 2 A more efficient solver for the CRMDP-TV Setting
Open Source Code	Yes	The implementation of our ORBIT algorithm is available at https://github.com/panxulab/Online-Robust-Bellman-Iteration.
Open Datasets	Yes	Now we test our algorithm in a hard-to-explore setting, the Frozen Lake problem. ... We use the default map in the Open AI Gym library, which is illustrated in Example A.1
Dataset Splits	No	The paper describes online interaction with environments (simulated MDPs, Frozen Lake) for K episodes and evaluates the learned policies in target environments with different perturbation rates. However, it does not provide specific training/test/validation dataset splits in the traditional sense, as data is generated dynamically through interaction.
Hardware Specification	Yes	All numerical experiments were conducted on a server equipped with Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz.
Software Dependencies	No	The paper mentions using the 'Open AI Gym library' but does not specify a version number for it or any other key software dependencies.
Experiment Setup	Yes	We set H = 25 and K = 1, 000 in Algorithm 1. The hyperparameter ρ in the constrained setting, β in the regularized setting, and cbonus are tuned from {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1}, with the final choice presented in Table 2. ... Table 2. hyper-parameters for Section 6.2 (Learning on Simulated RMDPs) ... Table 3. hyper-parameters for Section 6.3 (Learning the Frozen Lake Problem)