Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Authors: Hao Liang, Zhi-Quan Luo

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the empirical performance of our algorithms, we conducted numerical experiments comparing RODI-MB, RODI-MF, and RODI-Rep with the risk-neutral algorithm UCBVI (Azar et al., 2017), RSVI in Fei et al. (2020), and RSVI2 in Fei et al. (2021). The experimental setup involved an MDP with S = 5 states, A = 5 actions, and a horizon H = 5, mirroring the setup in Du et al. (2022). The results, as illustrated in Figure 1, demonstrates the regret ranking of these algorithms.
Researcher Affiliation	Academia	Hao Liang EMAIL School of Science and Engineering The Chinese University of Hong Kong, Shenzhen. Zhi-Quan Luo EMAIL School of Science and Engineering The Chinese University of Hong Kong, Shenzhen.
Pseudocode	Yes	Algorithm 1 RODI-MF. Algorithm 2 RODI-MB. Algorithm 3 ROVI.
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it include any links to code repositories.
Open Datasets	No	The experimental setup involved an MDP with S = 5 states, A = 5 actions, and a horizon H = 5, mirroring the setup in Du et al. (2022). The paper describes a synthetic MDP environment for experiments and does not use or provide access to any external datasets.
Dataset Splits	No	The paper describes a synthetic MDP environment with specific parameters (S=5 states, A=5 actions, H=5 horizon) rather than using an external dataset. Therefore, the concept of training/test/validation splits is not applicable, and no such splits are provided.
Hardware Specification	No	The paper mentions numerical experiments but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run these experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers) used for the implementation or experiments.
Experiment Setup	Yes	The experimental setup involved an MDP with S = 5 states, A = 5 actions, and a horizon H = 5, mirroring the setup in Du et al. (2022). We set δ = 0.005 and β = 1.1.