Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

Authors: Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.Our main results characterize the tradeoff between consistency and robustness for both black-box and grey-box settings in terms of the ratio of expectations, Ro E, built upon the traditional consistency and robustness metrics in [3, 22, 23, 4] for the competitive ratio.
Researcher Affiliation Academia Tongxin Li School of Data Science CUHK-SZ, China litongxin@cuhk.edu.cn Yiheng Lin Computing + Mathematical Sciences Caltech, USA yihengl@caltech.edu Shaolei Ren Electrical & Computer Engineering UC Riverside, USA shaolei@ucr.edu Adam Wierman Computing + Mathematical Sciences Caltech, USA adamw@caltech.edu
Pseudocode Yes Algorithm 1 PROjection Pursuit Policy (PROP)
Open Source Code No The paper does not include a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper presents theoretical results and algorithms, and does not involve empirical evaluation on a dataset, so there is no dataset to be publicly available.
Dataset Splits No The paper is theoretical and does not perform empirical evaluations on datasets, hence no training/test/validation splits are described.
Hardware Specification No The paper is theoretical and does not report on computational experiments that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and does not implement or report on software usage with version numbers.
Experiment Setup No The paper is theoretical and does not describe any empirical experimental setup details such as hyperparameters or training configurations.