Differentially Private Reinforcement Learning with Self-Play
Authors: Dan Qiao, Yu-Xiang Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Then we design a provably efficient algorithm based on optimistic Nash value iteration and privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP and LDP requirements when instantiated with appropriate privacy mechanisms. Furthermore, for both notions of DP, our regret bound generalizes the best known result under the single-agent RL case, while our regret could also reduce to the best known result for multi-agent RL without privacy constraints. |
| Researcher Affiliation | Academia | Dan Qiao Department of Computer Science & Engineering University of California, San Diego San Diego, CA 92093 d2qiao@ucsd.edu Yu-Xiang Wang Halıcıo glu Data Science Institute University of California, San Diego San Diego, CA 92093 yuxiangw@ucsd.edu |
| Pseudocode | Yes | Algorithm 1 Differentially Private Optimistic Nash Value Iteration (DP-Nash-VI) |
| Open Source Code | No | This is a theory paper and we do not conduct experiments. The paper does not provide any statement or link regarding the release of source code. |
| Open Datasets | No | This is a theory paper and we do not conduct experiments. As such, no datasets are used for training or mentioned as publicly available. |
| Dataset Splits | No | This is a theory paper and we do not conduct experiments. Therefore, no dataset splits for validation are provided. |
| Hardware Specification | No | This is a theory paper and we do not conduct experiments. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | This is a theory paper and we do not conduct experiments. Therefore, no specific software dependencies or versions for experimental reproduction are listed. |
| Experiment Setup | No | This is a theory paper and we do not conduct experiments. Therefore, no experimental setup details such as hyperparameters or system-level training settings are provided. |