Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

Authors: Laixi Shi, Yuejie Chi

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on the gambler s problem (Sutton and Barto, 2018; Zhou et al., 2021) to evaluate the performance of the proposed algorithm DRVI-LCB, with comparisons to the robust value iteration algorithm DRVI without pessimism (Panaganti and Kalathil, 2022). Our code can be accessed at: https://github.com/Laixishi/Robust-RL-with-KL-divergence. ... Figure 1(a) plots the sub-optimality value gap ... Figure 1(b) shows the sub-optimality gap ... Figure 1(c) illustrates the ratio of winning ... Figure 1(d) show that DRVI-LCB performs consistently better than DRVI ... Figure 2 shows the sub-optimality value gap with respect to the number of trajectories K...
Researcher Affiliation	Academia	Laixi Shi EMAIL Computing Mathematical Sciences California Institute of Technology Pasadena, CA, 91125, USA Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA, 15213, USA
Pseudocode	Yes	Algorithm 1: Two-fold subsampling trick for the ﬁnite-horizon setting. ... Algorithm 2: Robust value iteration with LCB (DRVI-LCB) for robust oﬄine RL. ... Algorithm 3: Robust value iteration with LCB (DRVI-LCB) for inﬁnite-horizon RMDPs.
Open Source Code	Yes	Our code can be accessed at: https://github.com/Laixishi/Robust-RL-with-KL-divergence.
Open Datasets	Yes	We conduct experiments on the gambler s problem (Sutton and Barto, 2018; Zhou et al., 2021) to evaluate the performance of the proposed algorithm DRVI-LCB
Dataset Splits	No	The paper describes generating its own
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Gambler s problem. ... with a state space S = {0, 1, , 50} and the associated possible actions a 0, 1, , min{s, 50 s} at state s. Here, we set the horizon length H = 100. ... We evaluate the performance of the learned policy bπ using our proposed method DRVI-LCB with comparison to DRVI without pessimism, where we ﬁx the uncertainty level σ = 0.1 for learning the robust optimal policy. ... Figure 1(b) shows the sub-optimality gap V ,σ 1 (ρ) V bπ,σ 1 (ρ) with varying sample sizes N = 100, 300, 1000, 3000, 5000...