reproducibilityindex.ai

Online Learning in Markov Decision Processes with Changing Cost Sequences

Authors: Travis Dick, Andras Gyorgy, Csaba Szepesvari

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we consider online learning in ﬁnite Markov decision processes (MDPs) with changing cost sequences under full and banditinformation. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2).
Researcher Affiliation	Academia	Travis Dick TDICK@UALBERTA.CA Andr as Gy orgy GYORGY@UALBERTA.CA Csaba Szepesv ari SZEPESVA@UALBERTA.CA Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8
Pseudocode	No	The paper describes algorithms mathematically (e.g., in Section 3 and 4) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	No	No statement explicitly providing access to open-source code for the described methodology (e.g., a repository link or a statement about code release) was found in the paper.
Open Datasets	No	The paper is theoretical, focusing on algorithm design and complexity analysis. It does not conduct empirical experiments or mention the use of datasets for training. Therefore, no information on public dataset access is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with data. Therefore, no information regarding training, validation, or test dataset splits is provided.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and analysis. It does not describe any empirical experiments or the hardware used to run them.
Software Dependencies	No	The paper focuses on theoretical algorithm design and analysis. It does not mention any specific software dependencies or versions required for empirical implementation or replication.
Experiment Setup	No	The paper is theoretical and does not include details about an empirical experimental setup, such as specific hyperparameter values, model initialization, or training schedules.