reproducibilityindex.ai

Fast Bellman Updates for Wasserstein Distributionally Robust MDPs

Authors: Zhuodong Yu, Ling Dai, Shaohang Xu, Siyang Gao, Chin Pang Ho

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments demonstrate that the proposed algorithms outperform other state-of-the-art solution methods.
Researcher Affiliation	Academia	1School of Data Science, City University of Hong Kong, Hong Kong 2Department of Systems Engineering, City University of Hong Kong, Hong Kong
Pseudocode	Yes	Algorithm 1: Nested bisection method to compute distributionally robust Bellman update (5)
Open Source Code	Yes	We will release our code to ensure reproducibility. https://github.com/Chill-zd/Fast-Bellman-Updates-DRMDP
Open Datasets	Yes	Our algorithms are tested on some random instances generated by the Generalized Average Reward Non-stationary Environment Test-bench (Garnet MDPs) (Archibald et al., 1995; Bhatnagar et al., 2009).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment.
Hardware Specification	Yes	All experiments are implemented in Python 3.8, and they are run on a 2.3 GHz 4-Core Intel Core i7 CPU with 32 GB 3733 MHz DDR4 main memory.
Software Dependencies	Yes	All experiments are implemented in Python 3.8, and they are run on a 2.3 GHz 4-Core Intel Core i7 CPU with 32 GB 3733 MHz DDR4 main memory. We compare our fast algorithm with the state-of-the-art solver Gurobi with version v10.0.1rc0 (Gurobi Optimization, LLC, 2023) and the first-order method of (Grand-Clément and Kroer, 2021a).
Experiment Setup	Yes	Following the same setting as (Grand-Clément and Kroer, 2021a), we set nb to be 0.2 and random uniform rewards to be in [0, 10]. The discount factor λ is fixed at 0.8, and parameter ϵ = 0.1. For each MDP instance, we generate the sampled kernels p1, . . . , p N, considering N small random (Garnet) perturbations around the nominal kernel p0. We set parameter θ in Proposition 4.1 to be nb A. To test the speed of Bellman update, we run the random instances 50 times for all the algorithms, and show the average time of them in tables 1, 2 and 3. We also compare the speed of value iteration for all algorithms using the same convergence criteria: v v 2λϵ(1 λ) 1, which follows (Grand-Clément and Kroer, 2021a).