Scalable First-Order Methods for Robust MDPs
Authors: Julien Grand-Clément, Christian Kroer12086-12094
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with s-rectangular KL uncertainty sets. ... Empirical performance. We focus our numerical experiments on ellipsoidal and KL uncertainty sets. |
| Researcher Affiliation | Academia | Julien Grand-Cl ement, Christian Kroer IEOR Department, Columbia University jg3728@columbia.edu, christian.kroer@columbia.edu |
| Pseudocode | Yes | Algorithm 1 First-order Method for Robust MDP with srectangular uncertainty set. |
| Open Source Code | No | The paper does not provide any statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper describes using 'healthcare management instance', 'machine replacement instance', and 'random Garnet MDPs'. While Garnet MDPs are a class, no specific link, DOI, or formal citation is provided for public access to the data instances used in the experiments. |
| Dataset Splits | No | The paper discusses an 'ϵ-optimal policy' and stopping conditions for Value Iteration but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | Yes | All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. |
| Software Dependencies | Yes | All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. We use Gurobi 8.1.1 to solve any linear or quadratic optimization problems involved. |
| Experiment Setup | Yes | Input Number of epochs k, number of iterations per epoch T1, ..., Tk, weights ω1, ..., ωT , and stepsizes τ, σ. ... For averaging the PD iterates, an increasing weight scheme, i.e. p 1 in ωt = tp, is clearly stronger (this is again similar to the matrix-game setting). We also recommend setting q = 2 (or even larger). ... We initialize the algorithms with v0 = 0. At epoch ℓof Value, AVI and Anderson, we warm-start each computation of F(vℓ) with the optimal solution obtained from the previous epoch ℓ 1. |