Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Partial Policy Iteration for L1-Robust Markov Decision Processes
Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration. ... We now compare the runtimes of PPI (Algorithm 4.1) combined with the homotopy method (Algorithm 5.1) and the bisection method (Algorithm 6.1) with the runtime of a naive approach that combines the robust value iteration with a computation of the robust Bellman optimality operator L using a general LP solver. We use Gurobi 9.0, a state-of-the-art commercial optimization package. |
| Researcher Affiliation | Academia | Chin Pang Ho EMAIL School of Data Science City University of Hong Kong 83 Tat Chee Avenue Kowloon Tong, Hong Kong Marek Petrik EMAIL Department of Computer Science University of New Hampshire Durham, NH, USA, 03861 Wolfram Wiesemann EMAIL Imperial College Business School Imperial College London London SW7 2AZ, United Kingdom |
| Pseudocode | Yes | Algorithm 4.1: Partial Policy Iteration (PPI) Algorithm 5.1: Homotopy method to compute qpξq. Algorithm 5.2: Identify non-dominated receivers i P S. Algorithm 6.1: Bisection scheme for the robust Bellman optimality operator (3.7) Algorithm B.1: Quasi-linear time bisection scheme for solving (3.7) |
| Open Source Code | Yes | A well-tested and documented implementation of the methods described in this paper is available at https://github.com/marekpetrik/craam2. ... The source code of the implementation is available at http://github. com/marekpetrik/PPI_paper. |
| Open Datasets | Yes | The two domains are the inventory management problem (Zipkin, 2000; Porteus, 2002) and the cart-pole problem (Lagoudakis and Parr, 2003). ... To facilitate the reproducibility of the domains, CSV files with the precise specification of the RMDPs being solved are available at http://github.com/marekpetrik/PPI_paper. |
| Dataset Splits | No | The paper describes the discretization of the cart-pole problem's continuous state space, stating: "The discretization follows a standard procedure in which random samples from the domain are subsampled to represent the discretized state space." However, it does not provide explicit training/test/validation dataset splits typically seen in machine learning experiments for reproducing data partitioning. |
| Hardware Specification | Yes | All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM. |
| Software Dependencies | Yes | We use Gurobi 9.0, a state-of-the-art commercial optimization package. All algorithms were implemented in C++, parallelized using the Open MP library, and used the Eigen library to perform linear algebra operations. The algorithms were compiled with GCC 9.3 and executed on an AMD Ryzen 9 3900X CPU with 64GB RAM. |
| Experiment Setup | Yes | We use a discount factor of 0.995. In our experiments, the fixed and variable ordering costs are 5.99 and 1.0, respectively. The inventory holding and backlogging costs are 0.1 and 0.15, respectively. ... The unit sales price is 1.6. The demand in each period follows the Gaussian distribution with a mean of I{2 and a standard deviation of I{5 and is rounded to the closest integer. We set the ambiguity budget to κs,a 0.2 and κs 1.0 in the sa-rectangular and s-rectangular version of our inventory management problem, respectively, and we set κs,a κs 0.1 in our cartpole problem. ... Finally, we set ϵk 1 mintγ2ϵk, 0.5{p1 γq }Lπkvk vk}8u in Algorithm 4.1, which satisfies the convergence condition in Theorem 4.5. ... To this end, we choose a precision of δ 40 (that is, }Lπkvk vk}8 ď 0.1), as defined in Algorithm 4.1, for our inventory management problem, as well as a smaller precision of δ 4 (that is, }Lπkvk vk}8 ď 0.01) for our cart-pole problem... |