Learning to Collaborate in Markov Decision Processes
Authors: Goran Radanovic, Rati Devidze, David Parkes, Adish Singla
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design novel online learning algorithms for agent A1 whose regret decays as O T max{1 3 4 } , for T learning episodes, provided that the magnitude in the change in agent A2 s policy between any two consecutive episodes is upper bounded by O(T α). Here, the parameter α is assumed to be strictly greater than 0, and we show that this assumption is necessary provided that the learning parity with noise problem is computationally hard. We show that sublinear regret of agent A1 further implies nearoptimality of the agents joint return for MDPs that manifest the properties of a smooth game. |
| Researcher Affiliation | Academia | 1Harvard University. 2Max Planck Institute for Software Systems (MPI-SWS). |
| Pseudocode | Yes | Algorithm 1: EXPDRBIAS |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not involve experimental evaluation on a dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental evaluation on a dataset, thus no dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe a practical experimental setup with hyperparameters or training settings. |