reproducibilityindex.ai

Scalable First-Order Methods for Robust MDPs

Authors: Julien Grand-Clément, Christian Kroer12086-12094

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is signiﬁcantly more scalable than state-of-the-art approaches. Our framework is also the ﬁrst one to solve robust MDPs with s-rectangular KL uncertainty sets. ... Empirical performance. We focus our numerical experiments on ellipsoidal and KL uncertainty sets.
Researcher Affiliation	Academia	Julien Grand-Cl ement, Christian Kroer IEOR Department, Columbia University jg3728@columbia.edu, christian.kroer@columbia.edu
Pseudocode	Yes	Algorithm 1 First-order Method for Robust MDP with srectangular uncertainty set.
Open Source Code	No	The paper does not provide any statements about releasing source code or links to a code repository.
Open Datasets	No	The paper describes using 'healthcare management instance', 'machine replacement instance', and 'random Garnet MDPs'. While Garnet MDPs are a class, no specific link, DOI, or formal citation is provided for public access to the data instances used in the experiments.
Dataset Splits	No	The paper discusses an 'ϵ-optimal policy' and stopping conditions for Value Iteration but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification	Yes	All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM.
Software Dependencies	Yes	All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. We use Gurobi 8.1.1 to solve any linear or quadratic optimization problems involved.
Experiment Setup	Yes	Input Number of epochs k, number of iterations per epoch T1, ..., Tk, weights ω1, ..., ωT , and stepsizes τ, σ. ... For averaging the PD iterates, an increasing weight scheme, i.e. p 1 in ωt = tp, is clearly stronger (this is again similar to the matrix-game setting). We also recommend setting q = 2 (or even larger). ... We initialize the algorithms with v0 = 0. At epoch ℓof Value, AVI and Anderson, we warm-start each computation of F(vℓ) with the optimal solution obtained from the previous epoch ℓ 1.