Differentially Private Policy Evaluation

Authors: Borja Balle, Maziar Gomrokchi, Doina Precup

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples.
Researcher Affiliation Academia Borja Balle B.DEBALLEPIGEM@LANCASTER.AC.UK Lancaster University Maziar Gomrokchi MGOMRO@CS.MCGILL.CA Doina Precup DPRECUP@CS.MCGILL.CA Mc Gill University
Pseudocode Yes Algorithm 1: DP-LSW Algorithm 2: DP-LSL
Open Source Code No No statement or link indicating that the source code for the methodology described in this paper is publicly available was found.
Open Datasets No The paper uses 'synthetic MDPs' for experiments and states 'Trajectories are drawn by starting in an initial state distribution and generating stateaction-reward transitions according to the described probabilities until the absorbing state is reached. Trajectories are harvested in a batch'. It does not refer to a publicly available dataset with a link, DOI, or formal citation.
Dataset Splits No The paper does not explicitly provide training/validation/test splits, specific percentages, or counts. It mentions 'batch sizes' but not how data was partitioned for training, validation, or testing.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software names with version numbers (e.g., Python, PyTorch, scikit-learn versions) that were used in the experiments.
Experiment Setup Yes The main results are summarized in Fig. 1, for an environment with N = 40 states, p = 0.5, discount γ = 0.99, and for the DP algorithms, ε = 0.1 and δ = 0.1. ... We experiment with both a tabular representation of the value function, as well as with function approximation. ... Standard errors computed over 20 independent runs are included.