Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation
Authors: Long-Fei Li, Peng Zhao, Zhi-Hua Zhou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a general framework to decouple the two sources of uncertainties and show the dynamic regret bound naturally decomposes into two terms... We provide dynamic regret guarantees respectively and show they are optimal in terms of the number of episodes K and the non-stationarity PK by establishing matching lower bounds. |
| Researcher Affiliation | Academia | Long-Fei Li, Peng Zhao, Zhi-Hua Zhou, National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {lilf, zhaop, zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1: Overall Algorithm framework |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not describe experiments that would use a dataset, therefore no information about dataset availability or access is provided. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design rather than empirical evaluation, and thus does not describe any dataset splits for validation. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any experimental setup or the hardware used for computations. |
| Software Dependencies | No | The paper does not describe an implementation or provide details on specific software dependencies and their versions. |
| Experiment Setup | No | The paper focuses on theoretical aspects, algorithm design, and proofs, without detailing a specific experimental setup or hyperparameters. |