Bayesian Risk-Averse Q-Learning with Streaming Observations

Authors: Yuhao Wang, Enlu Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiments
Researcher Affiliation Academia Yuhao Wang School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 yuhaowang@gatech.edu Enlu Zhou School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 enlu.zhou@isye.gatech.edu
Pseudocode Yes Algorithm 1 Multi-stage Bayesian risk-averse Q-learning
Open Source Code No The paper does not include an explicit statement about releasing the code for its proposed methodology or a link to a code repository.
Open Datasets No The paper uses simulated environments described as "Coin Toss" and "Inventory Management" examples. While it mentions "historical data set" for the Inventory Management problem, it does not provide concrete access information (link, DOI, repository, or formal citation) for any publicly available or open dataset used for training.
Dataset Splits No The paper describes batch sizes for streaming observations (e.g., "batch size n(t) = 1", "n(0) = 10") but does not provide explicit training/validation/test dataset splits with percentages or sample counts for fixed datasets. The experimental setup indicates a dynamic, streaming data environment rather than static splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or solvers used for implementing their algorithms or running experiments.
Experiment Setup Yes Coin toss. We consider stage-wise streaming observations with batch size n(t) = 1 and stage-wise Q-learning with number of steps m(t) = 1. The minimal sample size to estimate the Bellman operator N = 10. Initial observed data batch size n(0) = 10. We set the radius of the KL ball and the Wasserstein ball to be 0.1. The risk level for Va R and CVa R is set to 0.2 in Figure 2 and 0.4 in Figure 3. Inventory Management. In Figure 4, We set K = 10, T = 60, m(t) = n(t) = 5, n(0) = 20, and N = 10. The radius of the KL and the Wasserstein ball is 0.05, and the risk level for Va R and CVa R is 0.2.