Distributionally Robust Optimization with Markovian Data

Authors: Mengmeng Li, Tobias Sutter, Daniel Kuhn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods. [...] We now validate the out-of-sample guarantees of the proposed Markov DRO approach experimentally when the observable data is indeed generated by ergodic Markov chains and compare our method against three state-of-the-art approaches to data-driven decision making from the literature. [...] Figures 2 and 4 show that the proposed Markov DRO method outperforms the SAA scheme and the DRO approach by Van Parys et al. (2021) tailored to i.i.d. data (denoted as i.i.d. DRO ) in that its out-of-sample risk displays a smaller mean as well as a smaller variance.
Researcher Affiliation Academia Mengmeng Li 1 Tobias Sutter 1 Daniel Kuhn 1 [...] 1Risk Analytics and Optimization Chair, Ecole Polytechnique F ed erale de Lausanne. Correspondence to: Mengmeng Li <mengmeng.li@epfl.ch>.
Pseudocode Yes Algorithm 1 Frank-Wolfe algorithm for solving (7) [...] Algorithm 2 Solution of the oracle subproblem [...] The exact algorithm that we use to solve (4b) is outlined in Algorithm 3 in Appendix E.
Open Source Code Yes The Matlab code for reproducing all results is available from https://github.com/mkvdro/DRO_Markov.
Open Datasets Yes The second experiment is based on a marketing dataset from Kaggle,2 which tracks the purchasing behavior of 2,000 customers with respect to d = 5 brands of chocolates. [...] 2https://www.kaggle.com/khalidnasereddin/retail-dataset-analysis
Dataset Splits No The paper does not explicitly provide details about training/validation/test splits with percentages, sample counts, or references to predefined splits for the datasets used in experiments. It refers to "training sample sizes T" but not how a larger dataset is partitioned.
Hardware Specification Yes The problem instances are modelled in MATLAB, and all experiments are run on an Intel i5-5257U CPU (2.7GHz) computer with 16GB RAM.
Software Dependencies No The paper states that "The problem instances are modelled in MATLAB" but does not specify the version number of MATLAB or any other software dependencies with version numbers.
Experiment Setup Yes In the first experiment we solve a random instance of the revenue maximization problem with n = 5 customer groups and d = 10 brands, where the weight vector w and the price vector a are sampled from the uniform distributions on n and {1, . . . , 10}d, respectively. [...] The second experiment is based on a marketing dataset from Kaggle [...] The customers are clustered into n = 5 groups based on their age, education level and income by using the K-means++ clustering algorithm by Vassilvitskii & Arthur (2006). [...] To this end, we solve 10 instances of the worst-case expectation problem (11) with rate parameter r = 1 for a fixed decision x sampled from the uniform distribution on {0, 1}d. The 10 instances involve independent training sets of size T = 5,000, each of which is sampled from the same fixed Markov chain constructed as in the experiments with synthetic data.