Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Authors: Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.Our main results characterize the tradeoff between consistency and robustness for both black-box and grey-box settings in terms of the ratio of expectations, Ro E, built upon the traditional consistency and robustness metrics in [3, 22, 23, 4] for the competitive ratio. |
| Researcher Affiliation | Academia | Tongxin Li School of Data Science CUHK-SZ, China litongxin@cuhk.edu.cn Yiheng Lin Computing + Mathematical Sciences Caltech, USA yihengl@caltech.edu Shaolei Ren Electrical & Computer Engineering UC Riverside, USA shaolei@ucr.edu Adam Wierman Computing + Mathematical Sciences Caltech, USA adamw@caltech.edu |
| Pseudocode | Yes | Algorithm 1 PROjection Pursuit Policy (PROP) |
| Open Source Code | No | The paper does not include a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper presents theoretical results and algorithms, and does not involve empirical evaluation on a dataset, so there is no dataset to be publicly available. |
| Dataset Splits | No | The paper is theoretical and does not perform empirical evaluations on datasets, hence no training/test/validation splits are described. |
| Hardware Specification | No | The paper is theoretical and does not report on computational experiments that would require specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not implement or report on software usage with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setup details such as hyperparameters or training configurations. |