Dynamic Regret of Online Markov Decision Processes

Authors: Peng Zhao, Long-Fei Li, Zhi-Hua Zhou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For the three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University. Correspondence to: Zhi-Hua Zhou <zhouzh@lamda.nju.edu.cn>.
Pseudocode Yes Algorithm 1 DO-REPS Algorithm 2 CODO-REPS Algorithm 3 REDO-REPS
Open Source Code No The paper does not mention providing open-source code or links to a code repository for the methodology described.
Open Datasets No The paper focuses on theoretical contributions and algorithm design. It does not conduct experiments on datasets, thus no information about publicly available or open datasets is provided.
Dataset Splits No The paper is theoretical and does not involve empirical validation with datasets. Therefore, no information regarding training, validation, or test dataset splits is provided.
Hardware Specification No The paper focuses on theoretical algorithms and their guarantees. It does not mention any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific software dependencies with version numbers required for implementation or execution.
Experiment Setup No The paper is theoretical and does not describe any experimental setup, hyperparameters, or training configurations.