Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
Authors: Yiting He, Zhishuai Liu, Weixin Wang, Pan Xu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical results through comprehensive numerical experiments. ... We conduct comprehensive numerical experiments to validate our theoretical findings. In a simulated MDP, we show that the performance of learned policies degrades as Cvr increases. We evaluate our algorithms in a simulated RMDP and the Frozen Lake environment, highlighting their effectiveness when distribution shifts are significant. |
| Researcher Affiliation | Academia | 1Duke University. Correspondence to: Pan Xu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Online Robust Bellman Iteration (ORBIT) ... Algorithm 2 A more efficient solver for the CRMDP-TV Setting |
| Open Source Code | Yes | The implementation of our ORBIT algorithm is available at https://github.com/panxulab/Online-Robust-Bellman-Iteration. |
| Open Datasets | Yes | Now we test our algorithm in a hard-to-explore setting, the Frozen Lake problem. ... We use the default map in the Open AI Gym library, which is illustrated in Example A.1 |
| Dataset Splits | No | The paper describes online interaction with environments (simulated MDPs, Frozen Lake) for K episodes and evaluates the learned policies in target environments with different perturbation rates. However, it does not provide specific training/test/validation dataset splits in the traditional sense, as data is generated dynamically through interaction. |
| Hardware Specification | Yes | All numerical experiments were conducted on a server equipped with Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz. |
| Software Dependencies | No | The paper mentions using the 'Open AI Gym library' but does not specify a version number for it or any other key software dependencies. |
| Experiment Setup | Yes | We set H = 25 and K = 1, 000 in Algorithm 1. The hyperparameter ρ in the constrained setting, β in the regularized setting, and cbonus are tuned from {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1}, with the final choice presented in Table 2. ... Table 2. hyper-parameters for Section 6.2 (Learning on Simulated RMDPs) ... Table 3. hyper-parameters for Section 6.3 (Learning the Frozen Lake Problem) |