Online Learning in Markov Decision Processes with Changing Cost Sequences
Authors: Travis Dick, Andras Gyorgy, Csaba Szepesvari
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and banditinformation. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2). |
| Researcher Affiliation | Academia | Travis Dick TDICK@UALBERTA.CA Andr as Gy orgy GYORGY@UALBERTA.CA Csaba Szepesv ari SZEPESVA@UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8 |
| Pseudocode | No | The paper describes algorithms mathematically (e.g., in Section 3 and 4) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | No | No statement explicitly providing access to open-source code for the described methodology (e.g., a repository link or a statement about code release) was found in the paper. |
| Open Datasets | No | The paper is theoretical, focusing on algorithm design and complexity analysis. It does not conduct empirical experiments or mention the use of datasets for training. Therefore, no information on public dataset access is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with data. Therefore, no information regarding training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and analysis. It does not describe any empirical experiments or the hardware used to run them. |
| Software Dependencies | No | The paper focuses on theoretical algorithm design and analysis. It does not mention any specific software dependencies or versions required for empirical implementation or replication. |
| Experiment Setup | No | The paper is theoretical and does not include details about an empirical experimental setup, such as specific hyperparameter values, model initialization, or training schedules. |