Dynamic Regret of Online Markov Decision Processes
Authors: Peng Zhao, Long-Fei Li, Zhi-Hua Zhou
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | For the three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure. |
| Researcher Affiliation | Academia | 1National Key Laboratory for Novel Software Technology, Nanjing University. Correspondence to: Zhi-Hua Zhou <zhouzh@lamda.nju.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 DO-REPS Algorithm 2 CODO-REPS Algorithm 3 REDO-REPS |
| Open Source Code | No | The paper does not mention providing open-source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper focuses on theoretical contributions and algorithm design. It does not conduct experiments on datasets, thus no information about publicly available or open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation with datasets. Therefore, no information regarding training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper focuses on theoretical algorithms and their guarantees. It does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific software dependencies with version numbers required for implementation or execution. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, hyperparameters, or training configurations. |