Improved Regret for Differentially Private Exploration in Linear MDP
Authors: Dung Daniel T Ngo, Giuseppe Vietri, Steven Wu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K3/5) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O(K) on the number of episodes. |
| Researcher Affiliation | Academia | 1 Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455, USA 2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Correspondence to: Dung Daniel Ngo <ngo00054@umn.edu>, Giuseppe Vietri <vietr002@umn.edu>, Zhiwei Steven Wu <zstevenwu@cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Private LSVI-UCB input Privacy parameter ρ, policy update rate C, fail probability p, confidence width β. 1: Notation: Let φ(xi h, ai h) = φi h |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets. It discusses a linear MDP setting and user interactions in an abstract sense without referencing any concrete, publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets, therefore no training/validation/test splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup, thus no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, rather than empirical experiments. Therefore, it does not provide details about an experimental setup or hyperparameters. |