Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Authors: Yulian Wu, Xingyu Zhou, Sayak Ray Chowdhury, Di Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All proofs and experiments are included in Appendix. In this section, we conduct proof-of-concept numerical experiments to verify our theoretical results for both policy-based and value-based algorithms. |
| Researcher Affiliation | Collaboration | 1 Provable Responsible AI and Data Analytics Lab 2King Abdullah University of Science and Technology, Saudi Arabia 3Wayne State University, USA 4Microsoft Research, India. |
| Pseudocode | Yes | Our general framework, Private-Heavy-UCBVI algorithm, is presented in Algorithm 1. See Algorithm 3 for details. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We consider the standard tabular MDP environment River Swim (Osband et al., 2013) |
| Dataset Splits | No | The paper does not explicitly describe training/test/validation dataset splits, as it uses a simulated MDP environment where data is generated interactively over episodes. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | We set all the parameters in our proposed algorithms in the same order as the theoretical results. We tune the learning rate η and the scaling of the confidence interval to obtain the best results. We run 10 independent experiments, each consisting of K = 2 × 10^4 episodes. Each episode is reset every H = 20 step. |