Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Private Reinforcement Learning with PAC and Regret Guarantees
Authors: Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Steven Wu
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, University of Minnesota 2Now at Deepmind 3Microsoft Research. Correspondence to: Giuseppe Vietri <EMAIL>, Zhiwei Steven Wu <EMAIL>, Akshay Krishnamurthy <EMAIL>, Borja Balle <EMAIL>. |
| Pseudocode | Yes | Algorithm 2 Private Upper Confidence Bound (PUCB) [...] Algorithm 3 Priv Q(er, en, em, ε) |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training or evaluation on datasets. Therefore, no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setup, including hyperparameters or training settings. |