Private Reinforcement Learning with PAC and Regret Guarantees
Authors: Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Steven Wu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, University of Minnesota 2Now at Deepmind 3Microsoft Research. Correspondence to: Giuseppe Vietri <vietr002@umn.edu>, Zhiwei Steven Wu <zstevenwu@cmu.edu>, Akshay Krishnamurthy <akshaykr@microsoft.com>, Borja Balle <borja.balle@gmail.com>. |
| Pseudocode | Yes | Algorithm 2 Private Upper Confidence Bound (PUCB) [...] Algorithm 3 Priv Q(er, en, em, ε) |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training or evaluation on datasets. Therefore, no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setup, including hyperparameters or training settings. |