Distributionally Robust Optimization with Markovian Data
Authors: Mengmeng Li, Tobias Sutter, Daniel Kuhn
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods. [...] We now validate the out-of-sample guarantees of the proposed Markov DRO approach experimentally when the observable data is indeed generated by ergodic Markov chains and compare our method against three state-of-the-art approaches to data-driven decision making from the literature. [...] Figures 2 and 4 show that the proposed Markov DRO method outperforms the SAA scheme and the DRO approach by Van Parys et al. (2021) tailored to i.i.d. data (denoted as i.i.d. DRO ) in that its out-of-sample risk displays a smaller mean as well as a smaller variance. |
| Researcher Affiliation | Academia | Mengmeng Li 1 Tobias Sutter 1 Daniel Kuhn 1 [...] 1Risk Analytics and Optimization Chair, Ecole Polytechnique F ed erale de Lausanne. Correspondence to: Mengmeng Li <mengmeng.li@epfl.ch>. |
| Pseudocode | Yes | Algorithm 1 Frank-Wolfe algorithm for solving (7) [...] Algorithm 2 Solution of the oracle subproblem [...] The exact algorithm that we use to solve (4b) is outlined in Algorithm 3 in Appendix E. |
| Open Source Code | Yes | The Matlab code for reproducing all results is available from https://github.com/mkvdro/DRO_Markov. |
| Open Datasets | Yes | The second experiment is based on a marketing dataset from Kaggle,2 which tracks the purchasing behavior of 2,000 customers with respect to d = 5 brands of chocolates. [...] 2https://www.kaggle.com/khalidnasereddin/retail-dataset-analysis |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test splits with percentages, sample counts, or references to predefined splits for the datasets used in experiments. It refers to "training sample sizes T" but not how a larger dataset is partitioned. |
| Hardware Specification | Yes | The problem instances are modelled in MATLAB, and all experiments are run on an Intel i5-5257U CPU (2.7GHz) computer with 16GB RAM. |
| Software Dependencies | No | The paper states that "The problem instances are modelled in MATLAB" but does not specify the version number of MATLAB or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In the first experiment we solve a random instance of the revenue maximization problem with n = 5 customer groups and d = 10 brands, where the weight vector w and the price vector a are sampled from the uniform distributions on n and {1, . . . , 10}d, respectively. [...] The second experiment is based on a marketing dataset from Kaggle [...] The customers are clustered into n = 5 groups based on their age, education level and income by using the K-means++ clustering algorithm by Vassilvitskii & Arthur (2006). [...] To this end, we solve 10 instances of the worst-case expectation problem (11) with rate parameter r = 1 for a fixed decision x sampled from the uniform distribution on {0, 1}d. The 10 instances involve independent training sets of size T = 5,000, each of which is sampled from the same fixed Markov chain constructed as in the experiments with synthetic data. |