reproducibilityindex.ai

Private Reinforcement Learning with PAC and Regret Guarantees

Authors: Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Steven Wu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, University of Minnesota 2Now at Deepmind 3Microsoft Research. Correspondence to: Giuseppe Vietri <vietr002@umn.edu>, Zhiwei Steven Wu <zstevenwu@cmu.edu>, Akshay Krishnamurthy <akshaykr@microsoft.com>, Borja Balle <borja.balle@gmail.com>.
Pseudocode	Yes	Algorithm 2 Private Upper Conﬁdence Bound (PUCB) [...] Algorithm 3 Priv Q(er, en, em, ε)
Open Source Code	No	The paper does not provide any statement or link regarding the release of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not involve empirical training or evaluation on datasets. Therefore, no information about publicly available datasets is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments or data splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experimental setup, including hyperparameters or training settings.