Improved Regret for Differentially Private Exploration in Linear MDP

Authors: Dung Daniel T Ngo, Giuseppe Vietri, Steven Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K3/5) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O(K) on the number of episodes.
Researcher Affiliation Academia 1 Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455, USA 2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Correspondence to: Dung Daniel Ngo <ngo00054@umn.edu>, Giuseppe Vietri <vietr002@umn.edu>, Zhiwei Steven Wu <zstevenwu@cmu.edu>.
Pseudocode Yes Algorithm 1 Private LSVI-UCB input Privacy parameter ρ, policy update rate C, fail probability p, confidence width β. 1: Notation: Let φ(xi h, ai h) = φi h
Open Source Code No The paper does not provide any statement or link regarding the release of open-source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments on specific datasets. It discusses a linear MDP setting and user interactions in an abstract sense without referencing any concrete, publicly available dataset.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets, therefore no training/validation/test splits are discussed.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental setup, thus no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs, rather than empirical experiments. Therefore, it does not provide details about an experimental setup or hyperparameters.