Learning Multiple Markov Chains via Adaptive Allocation

Authors: Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain. We present a novel learning algorithm that efficiently balances exploration and exploitation intrinsic to this problem, without any prior knowledge of the chains. We provide finite-sample PAC-type guarantees on the performance of the algorithm. Further, we show that our algorithm asymptotically attains an optimal loss. All proofs are provided in the supplementary material.
Researcher Affiliation Academia Mohammad Sadegh Talebi SequeL Team, Inria Lille Nord Europe sadegh.talebi@inria.fr Odalric-Ambrym Maillard SequeL Team, Inria Lille Nord Europe odalric.maillard@inria.fr
Pseudocode Yes Algorithm 1 BA-MC Bandit Allocation for Markov Chains
Open Source Code No The paper is theoretical and focuses on algorithm design and proofs; it does not mention the release of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not involve experimental data or datasets. Therefore, it does not mention public availability of a training dataset.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets. There is no mention of dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any experiments or the hardware used to run them.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies or versions for implementation or experimentation.
Experiment Setup No The paper is theoretical and presents an algorithm with performance guarantees. It does not describe an experimental setup with hyperparameters or system-level training settings.