Learning Multiple Markov Chains via Adaptive Allocation
Authors: Mohammad Sadegh Talebi, Odalric-Ambrym Maillard
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain. We present a novel learning algorithm that efficiently balances exploration and exploitation intrinsic to this problem, without any prior knowledge of the chains. We provide finite-sample PAC-type guarantees on the performance of the algorithm. Further, we show that our algorithm asymptotically attains an optimal loss. All proofs are provided in the supplementary material. |
| Researcher Affiliation | Academia | Mohammad Sadegh Talebi SequeL Team, Inria Lille Nord Europe sadegh.talebi@inria.fr Odalric-Ambrym Maillard SequeL Team, Inria Lille Nord Europe odalric.maillard@inria.fr |
| Pseudocode | Yes | Algorithm 1 BA-MC Bandit Allocation for Markov Chains |
| Open Source Code | No | The paper is theoretical and focuses on algorithm design and proofs; it does not mention the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve experimental data or datasets. Therefore, it does not mention public availability of a training dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets. There is no mention of dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments or the hardware used to run them. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or versions for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and presents an algorithm with performance guarantees. It does not describe an experimental setup with hyperparameters or system-level training settings. |