Bayesian Risk-Averse Q-Learning with Streaming Observations
Authors: Yuhao Wang, Enlu Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments |
| Researcher Affiliation | Academia | Yuhao Wang School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 yuhaowang@gatech.edu Enlu Zhou School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332 enlu.zhou@isye.gatech.edu |
| Pseudocode | Yes | Algorithm 1 Multi-stage Bayesian risk-averse Q-learning |
| Open Source Code | No | The paper does not include an explicit statement about releasing the code for its proposed methodology or a link to a code repository. |
| Open Datasets | No | The paper uses simulated environments described as "Coin Toss" and "Inventory Management" examples. While it mentions "historical data set" for the Inventory Management problem, it does not provide concrete access information (link, DOI, repository, or formal citation) for any publicly available or open dataset used for training. |
| Dataset Splits | No | The paper describes batch sizes for streaming observations (e.g., "batch size n(t) = 1", "n(0) = 10") but does not provide explicit training/validation/test dataset splits with percentages or sample counts for fixed datasets. The experimental setup indicates a dynamic, streaming data environment rather than static splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or solvers used for implementing their algorithms or running experiments. |
| Experiment Setup | Yes | Coin toss. We consider stage-wise streaming observations with batch size n(t) = 1 and stage-wise Q-learning with number of steps m(t) = 1. The minimal sample size to estimate the Bellman operator N = 10. Initial observed data batch size n(0) = 10. We set the radius of the KL ball and the Wasserstein ball to be 0.1. The risk level for Va R and CVa R is set to 0.2 in Figure 2 and 0.4 in Figure 3. Inventory Management. In Figure 4, We set K = 10, T = 60, m(t) = n(t) = 5, n(0) = 20, and N = 10. The radius of the KL and the Wasserstein ball is 0.05, and the risk level for Va R and CVa R is 0.2. |