Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Partial Policy Iteration for L1-Robust Markov Decision Processes

Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration. ... We now compare the runtimes of PPI (Algorithm 4.1) combined with the homotopy method (Algorithm 5.1) and the bisection method (Algorithm 6.1) with the runtime of a naive approach that combines the robust value iteration with a computation of the robust Bellman optimality operator L using a general LP solver. We use Gurobi 9.0, a state-of-the-art commercial optimization package.
Researcher Affiliation Academia Chin Pang Ho EMAIL School of Data Science City University of Hong Kong 83 Tat Chee Avenue Kowloon Tong, Hong Kong Marek Petrik EMAIL Department of Computer Science University of New Hampshire Durham, NH, USA, 03861 Wolfram Wiesemann EMAIL Imperial College Business School Imperial College London London SW7 2AZ, United Kingdom
Pseudocode Yes Algorithm 4.1: Partial Policy Iteration (PPI) Algorithm 5.1: Homotopy method to compute qpξq. Algorithm 5.2: Identify non-dominated receivers i P S. Algorithm 6.1: Bisection scheme for the robust Bellman optimality operator (3.7) Algorithm B.1: Quasi-linear time bisection scheme for solving (3.7)
Open Source Code Yes A well-tested and documented implementation of the methods described in this paper is available at https://github.com/marekpetrik/craam2. ... The source code of the implementation is available at http://github. com/marekpetrik/PPI_paper.
Open Datasets Yes The two domains are the inventory management problem (Zipkin, 2000; Porteus, 2002) and the cart-pole problem (Lagoudakis and Parr, 2003). ... To facilitate the reproducibility of the domains, CSV files with the precise specification of the RMDPs being solved are available at http://github.com/marekpetrik/PPI_paper.
Dataset Splits No The paper describes the discretization of the cart-pole problem's continuous state space, stating: "The discretization follows a standard procedure in which random samples from the domain are subsampled to represent the discretized state space." However, it does not provide explicit training/test/validation dataset splits typically seen in machine learning experiments for reproducing data partitioning.
Hardware Specification Yes All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM.
Software Dependencies Yes We use Gurobi 9.0, a state-of-the-art commercial optimization package. All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM.
Experiment Setup Yes We use a discount factor of 0.995. In our experiments, the fixed and variable ordering costs are 5.99 and 1.0, respectively. The inventory holding and backlogging costs are 0.1 and 0.15, respectively. ... The unit sales price is 1.6. The demand in each period follows the Gaussian distribution with a mean of I{2 and a standard deviation of I{5 and is rounded to the closest integer. We set the ambiguity budget to κs,a 0.2 and κs 1.0 in the sa-rectangular and s-rectangular version of our inventory management problem, respectively, and we set κs,a κs 0.1 in our cartpole problem. ... Finally, we set ϵk 1 mintγ2ϵk, 0.5{p1 γq }Lπkvk vk}8u in Algorithm 4.1, which satisfies the convergence condition in Theorem 4.5. ... To this end, we choose a precision of δ 40 (that is, }Lπkvk vk}8 ď 0.1), as defined in Algorithm 4.1, for our inventory management problem, as well as a smaller precision of δ 4 (that is, }Lπkvk vk}8 ď 0.01) for our cart-pole problem...