reproducibilityindex.ai

Dynamic Planning and Learning under Recovering Rewards

Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	we propose, construct and prove performance guarantees for a class of Purely Periodic Policies . For the online problem when the model parameters are unknown and need to be learned, we design an Upper Conﬁdence Bound (UCB) based policy that approximately has e O(NT) regret against the ofﬂine benchmark. Our framework and policy design may have the potential to be adapted into other ofﬂine planning and online learning applications with non-stationary and recovering rewards. Also, we would also like to conduct experiments to see the practical performance of our policies for various application needs.
Researcher Affiliation	Academia	1Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Massachusetts, USA 2Department of Industrial Engineering and Operations Research, University of California, Berkeley, USA.
Pseudocode	Yes	Algorithm 1 Ofﬂine Purely Periodic Planning
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	The paper is theoretical and does not conduct experiments with specific datasets, thus it does not mention whether any datasets are publicly available.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments with datasets, so it does not discuss dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not report on empirical experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe implementation details or report on empirical experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments or their setup, including hyperparameters or training configurations.