Bayesian Risk Markov Decision Processes

Authors: Yifan Lin, Yuxuan Ren, Enlu Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the empirical performance of the BR-MDP formulation and proposed algorithms on a gambler s betting problem and an inventory control problem.
Researcher Affiliation Academia Yifan Lin Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA ylin429@gatech.edu Yuxuan Ren Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA yren79@gatech.edu Enlu Zhou Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA enlu.zhou@isye.gatech.edu
Pseudocode Yes Algorithm 1: Exact dynamic programming for finite-horizon BR-MDPs. and Algorithm 2: Approximate dynamic programming for finite-horizon CVa R BR-MDPs.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No The paper mentions 'The data set consists of historical betting records with size N.' and 'The data set consists of historical customer demands with size N.', but it does not provide any specific link, citation, or name of a publicly available dataset for these historical records.
Dataset Splits Yes This is referred to as one replication, and we repeat the experiments for 100 replications on different independent data sets.
Hardware Specification No The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]' in the author checklist, but no specific hardware details (like GPU models, CPU types, or memory) are provided in the main text of the paper.
Software Dependencies No The paper does not provide specific software names along with their version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') that would be necessary for reproduction.
Experiment Setup Yes The gambler bets for T = 6 rounds. The cost at each time stage is a ξ, where a stands for action of how much to bet, ξ = 2 stands for a win, ξ = 1 stands for a loss, and the winning rate θc = P(ξ = 2) is unknown. We add a constant c = 10 to make the adjusted cost c a ξ non-negative to run our algorithm (since the algorithm requires non-negative stage-wise cost), and then subtract c T from the resultant total adjusted cost to recover the total cost.