Reinforcement Learning with Feedback Graphs
Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples. ... We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can incorporate additional observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph... Our main contributions, summarized in Table 1, are: We prove that, by incorporating the additional observations into the model estimation step, existing model-based RL algorithms [11, 12] can achieve regret and sample-complexity bounds that scale with the mas-number µ of the feedback graph... We give a lower bound on the regret (Appendix B)... We present an algorithm that overcomes the above challenges for the MDP setting and achieves a sample complexity bound that scales with the more favorable domination number γ in the leading term (Section 5). |
| Researcher Affiliation | Collaboration | Christoph Dann Google Research cdann@cdann.net Yishay Mansour Tel Aviv University and Google Research mansour.yishay@gmail.com Mehryar Mohri Google Research and Courant Institute of Math. Sciences mohri@google.com Ayush Sekhari Cornell University as3663@cornell.edu Karthik Sridharan Cornell University ks999@cornell.edu |
| Pseudocode | Yes | Algorithm 1: Optimistic model-based RL; Algorithm 2: Sample Episode( , s1, D) |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the methodology described is publicly available. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical proofs and algorithms; it does not mention training on specific public datasets for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments, therefore no dataset splits (training, validation, test) are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup, hence no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithms and proofs; it does not include details about an experimental setup, hyperparameters, or training configurations. |