Algorithms for Differentially Private Multi-Armed Bandits

Authors: Aristide Tossou, Christos Dimitrakakis

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We perform experiments using arms with rewards drawn from independent Bernoulli distribution. The plot, in logarithmic scale, shows the regret of the algorithms over 100, 000 time steps averaged over 100 runs. We targeted 2 different ϵ privacy levels : 0.1 and 1.
Researcher Affiliation Academia Aristide C. Y. Tossou and Christos Dimitrakakis Chalmers University of Technology, Gothenburg, Sweden {aristide, chrdimi}@chalmers.se
Pseudocode Yes Algorithm 1 DP-UCB-BOUND Input ϵ, the differential privacy parameter. Instantiate K Hybrid Mechanisms (Chan, Shi, and Song 2010); one for each arm a. for t 1 to T do if t K then play arm a = t and observe the reward rt Insert rt to the hybrid mechanism a else for all arm a do sa(t) total sum computed using the hybrid mechanism a n a,t na,t 2 log2 na,t (Remove the closest power of 2 from na,t) νa 4 8 ϵ log t (log2 n a,t + 1) end for Pull arm at = arg maxa sa(t) na,t + νa na,t Observe the reward rt Insert rt to the hybrid mechanism for arm at end if end for
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets No We perform experiments using arms with rewards drawn from independent Bernoulli distribution. The paper describes generating data for the experiments rather than using a pre-existing public dataset with explicit access information.
Dataset Splits No The paper describes running experiments for a certain number of time steps over multiple runs, but it does not specify train/validation/test splits, which are typically used for traditional supervised learning datasets.
Hardware Specification No The paper does not provide any details about the specific hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers that would be required to reproduce the experiments.
Experiment Setup Yes We targeted 2 different ϵ privacy levels : 0.1 and 1. For DP-UCB-INT, we pick ϵ according to corollary 3.2 such that the overall privacy is (ϵ, δ)-DP with δ = e 10, ϵ {0.1, 1}, v = 1.1. We compared against the non private UCB algorithm and the algorithm presented in (Mishra and Thakurta 2015) (Private-UCB ) with a failure probability chosen to be t 4. We perform two scenarios. Firstly we used two arms: one with expectation 0.9 and the other 0.6. The second scenario is a more challenging one with 10 arms having all an expectation of 0.1 except two with 0.55 and 0.2.