Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
Authors: Laixi Shi, Yuejie Chi
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the gambler s problem (Sutton and Barto, 2018; Zhou et al., 2021) to evaluate the performance of the proposed algorithm DRVI-LCB, with comparisons to the robust value iteration algorithm DRVI without pessimism (Panaganti and Kalathil, 2022). Our code can be accessed at: https://github.com/Laixishi/Robust-RL-with-KL-divergence. ... Figure 1(a) plots the sub-optimality value gap ... Figure 1(b) shows the sub-optimality gap ... Figure 1(c) illustrates the ratio of winning ... Figure 1(d) show that DRVI-LCB performs consistently better than DRVI ... Figure 2 shows the sub-optimality value gap with respect to the number of trajectories K... |
| Researcher Affiliation | Academia | Laixi Shi EMAIL Computing Mathematical Sciences California Institute of Technology Pasadena, CA, 91125, USA Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA, 15213, USA |
| Pseudocode | Yes | Algorithm 1: Two-fold subsampling trick for the finite-horizon setting. ... Algorithm 2: Robust value iteration with LCB (DRVI-LCB) for robust offline RL. ... Algorithm 3: Robust value iteration with LCB (DRVI-LCB) for infinite-horizon RMDPs. |
| Open Source Code | Yes | Our code can be accessed at: https://github.com/Laixishi/Robust-RL-with-KL-divergence. |
| Open Datasets | Yes | We conduct experiments on the gambler s problem (Sutton and Barto, 2018; Zhou et al., 2021) to evaluate the performance of the proposed algorithm DRVI-LCB |
| Dataset Splits | No | The paper describes generating its own |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Gambler s problem. ... with a state space S = {0, 1, , 50} and the associated possible actions a 0, 1, , min{s, 50 s} at state s. Here, we set the horizon length H = 100. ... We evaluate the performance of the learned policy bπ using our proposed method DRVI-LCB with comparison to DRVI without pessimism, where we fix the uncertainty level σ = 0.1 for learning the robust optimal policy. ... Figure 1(b) shows the sub-optimality gap V ,σ 1 (ρ) V bπ,σ 1 (ρ) with varying sample sizes N = 100, 300, 1000, 3000, 5000... |