Online Learning in Markov Decision Processes with Changing Cost Sequences

Authors: Travis Dick, Andras Gyorgy, Csaba Szepesvari

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and banditinformation. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2).
Researcher Affiliation Academia Travis Dick TDICK@UALBERTA.CA Andr as Gy orgy GYORGY@UALBERTA.CA Csaba Szepesv ari SZEPESVA@UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8
Pseudocode No The paper describes algorithms mathematically (e.g., in Section 3 and 4) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No No statement explicitly providing access to open-source code for the described methodology (e.g., a repository link or a statement about code release) was found in the paper.
Open Datasets No The paper is theoretical, focusing on algorithm design and complexity analysis. It does not conduct empirical experiments or mention the use of datasets for training. Therefore, no information on public dataset access is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with data. Therefore, no information regarding training, validation, or test dataset splits is provided.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis. It does not describe any empirical experiments or the hardware used to run them.
Software Dependencies No The paper focuses on theoretical algorithm design and analysis. It does not mention any specific software dependencies or versions required for empirical implementation or replication.
Experiment Setup No The paper is theoretical and does not include details about an empirical experimental setup, such as specific hyperparameter values, model initialization, or training schedules.