Dynamic Planning and Learning under Recovering Rewards
Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | we propose, construct and prove performance guarantees for a class of Purely Periodic Policies . For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has e O(NT) regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards. Also, we would also like to conduct experiments to see the practical performance of our policies for various application needs. |
| Researcher Affiliation | Academia | 1Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Massachusetts, USA 2Department of Industrial Engineering and Operations Research, University of California, Berkeley, USA. |
| Pseudocode | Yes | Algorithm 1 Offline Purely Periodic Planning |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with specific datasets, thus it does not mention whether any datasets are publicly available. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments with datasets, so it does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe implementation details or report on empirical experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments or their setup, including hyperparameters or training configurations. |