Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable First-Order Methods for Robust MDPs
Authors: Julien Grand-Clément, Christian Kroer12086-12094
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with s-rectangular KL uncertainty sets. ... Empirical performance. We focus our numerical experiments on ellipsoidal and KL uncertainty sets. |
| Researcher Affiliation | Academia | Julien Grand-Cl ement, Christian Kroer IEOR Department, Columbia University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 First-order Method for Robust MDP with srectangular uncertainty set. |
| Open Source Code | No | The paper does not provide any statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper describes using 'healthcare management instance', 'machine replacement instance', and 'random Garnet MDPs'. While Garnet MDPs are a class, no specific link, DOI, or formal citation is provided for public access to the data instances used in the experiments. |
| Dataset Splits | No | The paper discusses an 'ϵ-optimal policy' and stopping conditions for Value Iteration but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | Yes | All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. |
| Software Dependencies | Yes | All simulations are implemented in Python 3.7.3, and performed on a laptop with 2.2 GHz Intel Core i7 and 8 GB of RAM. We use Gurobi 8.1.1 to solve any linear or quadratic optimization problems involved. |
| Experiment Setup | Yes | Input Number of epochs k, number of iterations per epoch T1, ..., Tk, weights ω1, ..., ωT , and stepsizes τ, σ. ... For averaging the PD iterates, an increasing weight scheme, i.e. p 1 in ωt = tp, is clearly stronger (this is again similar to the matrix-game setting). We also recommend setting q = 2 (or even larger). ... We initialize the algorithms with v0 = 0. At epoch ℓof Value, AVI and Anderson, we warm-start each computation of F(vℓ) with the optimal solution obtained from the previous epoch ℓ 1. |