reproducibilityindex.ai

Improved Regret for Differentially Private Exploration in Linear MDP

Authors: Dung Daniel T Ngo, Giuseppe Vietri, Steven Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K3/5) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O(K) on the number of episodes.
Researcher Affiliation	Academia	1 Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455, USA 2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Correspondence to: Dung Daniel Ngo <ngo00054@umn.edu>, Giuseppe Vietri <vietr002@umn.edu>, Zhiwei Steven Wu <zstevenwu@cmu.edu>.
Pseudocode	Yes	Algorithm 1 Private LSVI-UCB input Privacy parameter ρ, policy update rate C, fail probability p, conﬁdence width β. 1: Notation: Let φ(xi h, ai h) = φi h
Open Source Code	No	The paper does not provide any statement or link regarding the release of open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets. It discusses a linear MDP setting and user interactions in an abstract sense without referencing any concrete, publicly available dataset.
Dataset Splits	No	The paper is theoretical and does not involve empirical validation on datasets, therefore no training/validation/test splits are discussed.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup, thus no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and proofs, rather than empirical experiments. Therefore, it does not provide details about an experimental setup or hyperparameters.