Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation

Authors: Long-Fei Li, Peng Zhao, Zhi-Hua Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a general framework to decouple the two sources of uncertainties and show the dynamic regret bound naturally decomposes into two terms... We provide dynamic regret guarantees respectively and show they are optimal in terms of the number of episodes K and the non-stationarity PK by establishing matching lower bounds.
Researcher Affiliation Academia Long-Fei Li, Peng Zhao, Zhi-Hua Zhou, National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {lilf, zhaop, zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1: Overall Algorithm framework
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets No The paper is theoretical and does not describe experiments that would use a dataset, therefore no information about dataset availability or access is provided.
Dataset Splits No The paper focuses on theoretical analysis and algorithm design rather than empirical evaluation, and thus does not describe any dataset splits for validation.
Hardware Specification No The paper is purely theoretical and does not describe any experimental setup or the hardware used for computations.
Software Dependencies No The paper does not describe an implementation or provide details on specific software dependencies and their versions.
Experiment Setup No The paper focuses on theoretical aspects, algorithm design, and proofs, without detailing a specific experimental setup or hyperparameters.