reproducibilityindex.ai

Differentially Private Policy Evaluation

Authors: Borja Balle, Maziar Gomrokchi, Doina Precup

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples.
Researcher Affiliation	Academia	Borja Balle B.DEBALLEPIGEM@LANCASTER.AC.UK Lancaster University Maziar Gomrokchi MGOMRO@CS.MCGILL.CA Doina Precup DPRECUP@CS.MCGILL.CA Mc Gill University
Pseudocode	Yes	Algorithm 1: DP-LSW Algorithm 2: DP-LSL
Open Source Code	No	No statement or link indicating that the source code for the methodology described in this paper is publicly available was found.
Open Datasets	No	The paper uses 'synthetic MDPs' for experiments and states 'Trajectories are drawn by starting in an initial state distribution and generating stateaction-reward transitions according to the described probabilities until the absorbing state is reached. Trajectories are harvested in a batch'. It does not refer to a publicly available dataset with a link, DOI, or formal citation.
Dataset Splits	No	The paper does not explicitly provide training/validation/test splits, specific percentages, or counts. It mentions 'batch sizes' but not how data was partitioned for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software names with version numbers (e.g., Python, PyTorch, scikit-learn versions) that were used in the experiments.
Experiment Setup	Yes	The main results are summarized in Fig. 1, for an environment with N = 40 states, p = 0.5, discount γ = 0.99, and for the DP algorithms, ε = 0.1 and δ = 0.1. ... We experiment with both a tabular representation of the value function, as well as with function approximation. ... Standard errors computed over 20 independent runs are included.