Robust $\phi$-Divergence MDPs
Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our fast suite of algorithms with the state-of-the-art solver MOSEK 9.3 [3] (commercial) and the first-order method of [14]. All experiments are implemented in C++, and they are run on a 3.6 GHz 8-Core Intel Core i9 CPU with 32 GB 2667 MHz DDR4 main memory. The source code is available at https://sites.google.com/view/clint-chin-pang-ho. Tables 2 4 report average computation times over 50 randomly generated test instances for the KL-divergence and the χ2-distance based ambiguity sets and show that the proposed algorithms outperform other methods. The tables reveal that our algorithms are about two orders of magnitude faster than MOSEK in solving the projection problem (5). |
| Researcher Affiliation | Academia | Chin Pang Ho City University of Hong Kong clint.ho@cityu.edu.hk Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Wolfram Wiesemann Imperial College London ww@imperial.ac.uk |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the main text of the paper. |
| Open Source Code | Yes | The source code is available at https://sites.google.com/view/clint-chin-pang-ho. |
| Open Datasets | No | For our experiments, we synthetically generate random RMDP instances as follows. For the projection problem, we sample each component of b uniformly at random between 0 and 1. Similarly, we sample each component of psa uniformly at random between 0 and 1 and subsequently scale psa so that its elements sum up to 1. The parameter β, finally, is uniformly distributed between min{b} + 10 8 and p sab 10 8 to adhere to the assumptions of our paper. For the robust Bellman update, all vectors bsa and all transition probabilities psa, s S and a A, are generated according to the above procedure. The parameter κ is also sampled from a uniform distribution supported on [0, 1]. |
| Dataset Splits | Yes | For our experiments, we synthetically generate random RMDP instances as follows. For the projection problem, we sample each component of b uniformly at random between 0 and 1. Similarly, we sample each component of psa uniformly at random between 0 and 1 and subsequently scale psa so that its elements sum up to 1. The parameter β, finally, is uniformly distributed between min{b} + 10 8 and p sab 10 8 to adhere to the assumptions of our paper. For the robust Bellman update, all vectors bsa and all transition probabilities psa, s S and a A, are generated according to the above procedure. The parameter κ is also sampled from a uniform distribution supported on [0, 1]. |
| Hardware Specification | Yes | All experiments are implemented in C++, and they are run on a 3.6 GHz 8-Core Intel Core i9 CPU with 32 GB 2667 MHz DDR4 main memory. |
| Software Dependencies | Yes | We compare our fast suite of algorithms with the state-of-the-art solver MOSEK 9.3 [3] (commercial) and the first-order method of [14]. All experiments are implemented in C++... |
| Experiment Setup | Yes | For our experiments, we synthetically generate random RMDP instances as follows. For the projection problem, we sample each component of b uniformly at random between 0 and 1. Similarly, we sample each component of psa uniformly at random between 0 and 1 and subsequently scale psa so that its elements sum up to 1. The parameter β, finally, is uniformly distributed between min{b} + 10 8 and p sab 10 8 to adhere to the assumptions of our paper. For the robust Bellman update, all vectors bsa and all transition probabilities psa, s S and a A, are generated according to the above procedure. The parameter κ is also sampled from a uniform distribution supported on [0, 1]. |