Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Authors: Zhiwei Wang, Huazheng Wang, Hongning Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to the theoretical analysis, we also performed an extensive set of experiments based on simulations to validate our results. Experiments We performed extensive empirical evaluations using simulation to verify our theoretical results against different MAB algorithms, attack methods, and environment configurations.
Researcher Affiliation Academia Zhiwei Wang1, Huazheng Wang2, Hongning Wang1 1Tsinghua University 2Oregon State University
Pseudocode No The paper describes the UCB1 and ϵ-greedy algorithms and their proposed attack method using textual descriptions and mathematical expressions (e.g., "We now give the attack method. Suppose η (0, 1) is chosen by the attacker. The attacker attacks arms in the following way: For the first N rounds(N is the number of arms) the attacker attacks in the following way: when t < N, for arm i > 1 and i = K the attacker attacks the arm and spends minimal attack cost to make ˆµi(N) ˆµ1(N) 2β(1, η) 2β(1) d, d 0 is a constant chosen by the attacker."), but it does not include a formally structured "Pseudocode" or "Algorithm" block.
Open Source Code No The paper does not contain any statements about releasing code for the methodology, nor does it provide links to any code repositories.
Open Datasets No The paper states, "We varied the number of arms N in {10, 30} and set each arm s reward distribution to an independent Gaussian distribution. The ground-truth mean reward µi of each arm i is sampled from N(0, 1)." This indicates a simulated environment rather than a publicly available dataset with concrete access information.
Dataset Splits No The paper mentions "We run each experiment for T = 10000 (for N = 10) or 20000 (for N = 30) iterations and repeat each experiment 20 times to report the mean and variance of performance," but does not specify train/validation/test dataset splits. The experiments are based on simulations with randomly sampled reward distributions rather than fixed datasets.
Hardware Specification No The paper states, "We performed extensive empirical evaluations using simulation," but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations.
Software Dependencies No The paper describes its methods and experiments but does not provide any specific software dependencies with version numbers, such as "Python 3.x" or "PyTorch 1.x".
Experiment Setup Yes Experiment Setup In our simulations, we execute the reward poisoning attack method proposed in (Jun et al. 2018) as our baseline and our attack algorithms against UCB1 and ϵ-greedy algorithms in the presence of attack detection proposed in Section . We varied the number of arms N in {10, 30} and set each arm s reward distribution to an independent Gaussian distribution. The ground-truth mean reward µi of each arm i is sampled from N(0, 1). For the ϵ-greedy algorithm, we set its exploration probability ϵt = min{1, CN t }. We set C = 500 > 3 is chosen only for the convenience of presenting the results. In all our experiments, we set the detection method s parameter δ to 0.05, the high probability coefficient η to 0.05, and the reward s noise scale σ in the environment to 0.1. We run each experiment for T = 10000 (for N = 10) or 20000 (for N = 30) iterations and repeat each experiment 20 times to report the mean and variance of performance.