An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule

Authors: Touqir Sajed, Or Sheffet

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also compare empirically the performance of our algorithm with the private UCB algorithm. (Abstract); our experiments (see Section 5) show a significantly improved performance empirically (Introduction); In this section, we empirically compare the existing DP-UCB algorithm to our DP-SE algorithm (Algorithm 3). (Section 5)
Researcher Affiliation Academia 1Department of Computing Science, University of Alberta, Edmonton AB, Canada. Correspondence to: Touqir Sajed <touqir@ualberta.ca>, Or Sheffet <osheffet@ualberta.ca>.
Pseudocode Yes Algorithm 1 DP-NAS; Algorithm 2 DP exponential NAS; Algorithm 3 DP Successive Elimination
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes In our experiments we set the default values of T = 5 107, ε = 0.25 and K = 5. We assume T is a-priori known to both algorithms and set β = 1/T. Due to space constraints, we only present our empirical results on one instance where the reward means of the arms are linearly spaced in the range [0.25, 0.75]. Namely in our instance, for K = 5, the mean rewards are in {0.75, 0.625, 0.5, 0.375, 0.25}.
Dataset Splits No The paper describes the setup for simulating the multi-armed bandit problem with defined parameters for reward distributions. However, it does not specify traditional train/validation/test dataset splits or cross-validation methods as commonly found in supervised learning tasks for data partitioning.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) that would be needed to replicate the experiments.
Experiment Setup Yes In our experiments we set the default values of T = 5 107, ε = 0.25 and K = 5. We assume T is a-priori known to both algorithms and set β = 1/T. Due to space constraints, we only present our empirical results on one instance where the reward means of the arms are linearly spaced in the range [0.25, 0.75]. Namely in our instance, for K = 5, the mean rewards are in {0.75, 0.625, 0.5, 0.375, 0.25}.