Dynamic Planning and Learning under Recovering Rewards

Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical we propose, construct and prove performance guarantees for a class of Purely Periodic Policies . For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has e O(NT) regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards. Also, we would also like to conduct experiments to see the practical performance of our policies for various application needs.
Researcher Affiliation Academia 1Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Massachusetts, USA 2Department of Industrial Engineering and Operations Research, University of California, Berkeley, USA.
Pseudocode Yes Algorithm 1 Offline Purely Periodic Planning
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets No The paper is theoretical and does not conduct experiments with specific datasets, thus it does not mention whether any datasets are publicly available.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments with datasets, so it does not discuss dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not report on empirical experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe implementation details or report on empirical experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not describe any empirical experiments or their setup, including hyperparameters or training configurations.