Scalable First-Order Methods for Robust MDPs

Authors: Julien Grand-Clément, Christian Kroer12086-12094

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with s-rectangular KL uncertainty sets. ... Empirical performance. We focus our numerical experiments on ellipsoidal and KL uncertainty sets.
Researcher Affiliation Academia Julien Grand-Cl ement, Christian Kroer IEOR Department, Columbia University jg3728@columbia.edu, christian.kroer@columbia.edu
Pseudocode Yes Algorithm 1 First-order Method for Robust MDP with srectangular uncertainty set.
Open Source Code No The paper does not provide any statements about releasing source code or links to a code repository.
Open Datasets No The paper describes using 'healthcare management instance', 'machine replacement instance', and 'random Garnet MDPs'. While Garnet MDPs are a class, no specific link, DOI, or formal citation is provided for public access to the data instances used in the experiments.
Dataset Splits No The paper discusses an 'ϵ-optimal policy' and stopping conditions for Value Iteration but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification Yes All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM.
Software Dependencies Yes All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. We use Gurobi 8.1.1 to solve any linear or quadratic optimization problems involved.
Experiment Setup Yes Input Number of epochs k, number of iterations per epoch T1, ..., Tk, weights ω1, ..., ωT , and stepsizes τ, σ. ... For averaging the PD iterates, an increasing weight scheme, i.e. p 1 in ωt = tp, is clearly stronger (this is again similar to the matrix-game setting). We also recommend setting q = 2 (or even larger). ... We initialize the algorithms with v0 = 0. At epoch ℓof Value, AVI and Anderson, we warm-start each computation of F(vℓ) with the optimal solution obtained from the previous epoch ℓ 1.