Differentially Private Policy Evaluation
Authors: Borja Balle, Maziar Gomrokchi, Doina Precup
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples. |
| Researcher Affiliation | Academia | Borja Balle B.DEBALLEPIGEM@LANCASTER.AC.UK Lancaster University Maziar Gomrokchi MGOMRO@CS.MCGILL.CA Doina Precup DPRECUP@CS.MCGILL.CA Mc Gill University |
| Pseudocode | Yes | Algorithm 1: DP-LSW Algorithm 2: DP-LSL |
| Open Source Code | No | No statement or link indicating that the source code for the methodology described in this paper is publicly available was found. |
| Open Datasets | No | The paper uses 'synthetic MDPs' for experiments and states 'Trajectories are drawn by starting in an initial state distribution and generating stateaction-reward transitions according to the described probabilities until the absorbing state is reached. Trajectories are harvested in a batch'. It does not refer to a publicly available dataset with a link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test splits, specific percentages, or counts. It mentions 'batch sizes' but not how data was partitioned for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software names with version numbers (e.g., Python, PyTorch, scikit-learn versions) that were used in the experiments. |
| Experiment Setup | Yes | The main results are summarized in Fig. 1, for an environment with N = 40 states, p = 0.5, discount γ = 0.99, and for the DP algorithms, ε = 0.1 and δ = 0.1. ... We experiment with both a tabular representation of the value function, as well as with function approximation. ... Standard errors computed over 20 independent runs are included. |