Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Partial Policy Iteration for L1-Robust Markov Decision Processes

Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration. ... We now compare the runtimes of PPI (Algorithm 4.1) combined with the homotopy method (Algorithm 5.1) and the bisection method (Algorithm 6.1) with the runtime of a naive approach that combines the robust value iteration with a computation of the robust Bellman optimality operator L using a general LP solver. We use Gurobi 9.0, a state-of-the-art commercial optimization package.
Researcher Affiliation	Academia	Chin Pang Ho EMAIL School of Data Science City University of Hong Kong 83 Tat Chee Avenue Kowloon Tong, Hong Kong Marek Petrik EMAIL Department of Computer Science University of New Hampshire Durham, NH, USA, 03861 Wolfram Wiesemann EMAIL Imperial College Business School Imperial College London London SW7 2AZ, United Kingdom
Pseudocode	Yes	Algorithm 4.1: Partial Policy Iteration (PPI) Algorithm 5.1: Homotopy method to compute qpξq. Algorithm 5.2: Identify non-dominated receivers i P S. Algorithm 6.1: Bisection scheme for the robust Bellman optimality operator (3.7) Algorithm B.1: Quasi-linear time bisection scheme for solving (3.7)
Open Source Code	Yes	A well-tested and documented implementation of the methods described in this paper is available at https://github.com/marekpetrik/craam2. ... The source code of the implementation is available at http://github. com/marekpetrik/PPI_paper.
Open Datasets	Yes	The two domains are the inventory management problem (Zipkin, 2000; Porteus, 2002) and the cart-pole problem (Lagoudakis and Parr, 2003). ... To facilitate the reproducibility of the domains, CSV ﬁles with the precise speciﬁcation of the RMDPs being solved are available at http://github.com/marekpetrik/PPI_paper.
Dataset Splits	No	The paper describes the discretization of the cart-pole problem's continuous state space, stating: "The discretization follows a standard procedure in which random samples from the domain are subsampled to represent the discretized state space." However, it does not provide explicit training/test/validation dataset splits typically seen in machine learning experiments for reproducing data partitioning.
Hardware Specification	Yes	All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM.
Software Dependencies	Yes	We use Gurobi 9.0, a state-of-the-art commercial optimization package. All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM.
Experiment Setup	Yes	We use a discount factor of 0.995. In our experiments, the ﬁxed and variable ordering costs are 5.99 and 1.0, respectively. The inventory holding and backlogging costs are 0.1 and 0.15, respectively. ... The unit sales price is 1.6. The demand in each period follows the Gaussian distribution with a mean of I{2 and a standard deviation of I{5 and is rounded to the closest integer. We set the ambiguity budget to κs,a 0.2 and κs 1.0 in the sa-rectangular and s-rectangular version of our inventory management problem, respectively, and we set κs,a κs 0.1 in our cartpole problem. ... Finally, we set ϵk 1 mintγ2ϵk, 0.5{p1 γq }Lπkvk vk}8u in Algorithm 4.1, which satisﬁes the convergence condition in Theorem 4.5. ... To this end, we choose a precision of δ 40 (that is, }Lπkvk vk}8 ď 0.1), as deﬁned in Algorithm 4.1, for our inventory management problem, as well as a smaller precision of δ 4 (that is, }Lπkvk vk}8 ď 0.01) for our cart-pole problem...