An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule
Authors: Touqir Sajed, Or Sheffet
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also compare empirically the performance of our algorithm with the private UCB algorithm. (Abstract); our experiments (see Section 5) show a significantly improved performance empirically (Introduction); In this section, we empirically compare the existing DP-UCB algorithm to our DP-SE algorithm (Algorithm 3). (Section 5) |
| Researcher Affiliation | Academia | 1Department of Computing Science, University of Alberta, Edmonton AB, Canada. Correspondence to: Touqir Sajed <touqir@ualberta.ca>, Or Sheffet <osheffet@ualberta.ca>. |
| Pseudocode | Yes | Algorithm 1 DP-NAS; Algorithm 2 DP exponential NAS; Algorithm 3 DP Successive Elimination |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | In our experiments we set the default values of T = 5 107, ε = 0.25 and K = 5. We assume T is a-priori known to both algorithms and set β = 1/T. Due to space constraints, we only present our empirical results on one instance where the reward means of the arms are linearly spaced in the range [0.25, 0.75]. Namely in our instance, for K = 5, the mean rewards are in {0.75, 0.625, 0.5, 0.375, 0.25}. |
| Dataset Splits | No | The paper describes the setup for simulating the multi-armed bandit problem with defined parameters for reward distributions. However, it does not specify traditional train/validation/test dataset splits or cross-validation methods as commonly found in supervised learning tasks for data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | In our experiments we set the default values of T = 5 107, ε = 0.25 and K = 5. We assume T is a-priori known to both algorithms and set β = 1/T. Due to space constraints, we only present our empirical results on one instance where the reward means of the arms are linearly spaced in the range [0.25, 0.75]. Namely in our instance, for K = 5, the mean rewards are in {0.75, 0.625, 0.5, 0.375, 0.25}. |