reproducibilityindex.ai

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Authors: Zhiwei Wang, Huazheng Wang, Hongning Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to the theoretical analysis, we also performed an extensive set of experiments based on simulations to validate our results. Experiments We performed extensive empirical evaluations using simulation to verify our theoretical results against different MAB algorithms, attack methods, and environment configurations.
Researcher Affiliation	Academia	Zhiwei Wang1, Huazheng Wang2, Hongning Wang1 1Tsinghua University 2Oregon State University
Pseudocode	No	The paper describes the UCB1 and ϵ-greedy algorithms and their proposed attack method using textual descriptions and mathematical expressions (e.g., "We now give the attack method. Suppose η (0, 1) is chosen by the attacker. The attacker attacks arms in the following way: For the first N rounds(N is the number of arms) the attacker attacks in the following way: when t < N, for arm i > 1 and i = K the attacker attacks the arm and spends minimal attack cost to make ˆµi(N) ˆµ1(N) 2β(1, η) 2β(1) d, d 0 is a constant chosen by the attacker."), but it does not include a formally structured "Pseudocode" or "Algorithm" block.
Open Source Code	No	The paper does not contain any statements about releasing code for the methodology, nor does it provide links to any code repositories.
Open Datasets	No	The paper states, "We varied the number of arms N in {10, 30} and set each arm s reward distribution to an independent Gaussian distribution. The ground-truth mean reward µi of each arm i is sampled from N(0, 1)." This indicates a simulated environment rather than a publicly available dataset with concrete access information.
Dataset Splits	No	The paper mentions "We run each experiment for T = 10000 (for N = 10) or 20000 (for N = 30) iterations and repeat each experiment 20 times to report the mean and variance of performance," but does not specify train/validation/test dataset splits. The experiments are based on simulations with randomly sampled reward distributions rather than fixed datasets.
Hardware Specification	No	The paper states, "We performed extensive empirical evaluations using simulation," but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations.
Software Dependencies	No	The paper describes its methods and experiments but does not provide any specific software dependencies with version numbers, such as "Python 3.x" or "PyTorch 1.x".
Experiment Setup	Yes	Experiment Setup In our simulations, we execute the reward poisoning attack method proposed in (Jun et al. 2018) as our baseline and our attack algorithms against UCB1 and ϵ-greedy algorithms in the presence of attack detection proposed in Section . We varied the number of arms N in {10, 30} and set each arm s reward distribution to an independent Gaussian distribution. The ground-truth mean reward µi of each arm i is sampled from N(0, 1). For the ϵ-greedy algorithm, we set its exploration probability ϵt = min{1, CN t }. We set C = 500 > 3 is chosen only for the convenience of presenting the results. In all our experiments, we set the detection method s parameter δ to 0.05, the high probability coefficient η to 0.05, and the reward s noise scale σ in the environment to 0.1. We run each experiment for T = 10000 (for N = 10) or 20000 (for N = 30) iterations and repeat each experiment 20 times to report the mean and variance of performance.