When Are Linear Stochastic Bandits Attackable?
Authors: Huazheng Wang, Haifeng Xu, Hongning Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method. and 5. Experiments We use simulation-based experiments to validate the effectiveness and cost-efficiency of our proposed attack methods. |
| Researcher Affiliation | Academia | Huazheng Wang * 1 Haifeng Xu 2 Hongning Wang 2 *Most work was done while with University of Virginia. 1Princeton University 2 University of Virginia. |
| Pseudocode | Yes | Algorithm 1 Two-stage Null Space Attack and Algorithm 2 Oracle Null Space Attack |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | In our simulations, we generate a size-k arm pool A, in which each arm a is associated with a context vector xa Rd. Each dimension of xa is drawn from a set of zero-mean Gaussian distributions with variances sampled from a uniform distribution U(0, 1). Each xa is then normalized to xa 2 = 1. The bandit model parameter θ is sampled from N(0, 1) and normalized to θ 2 = 1. |
| Dataset Splits | No | The paper describes simulation-based experiments where data is generated for each run. It does not provide explicit dataset split percentages, sample counts, or references to predefined train/validation/test splits of a larger dataset. |
| Hardware Specification | No | The paper mentions running 'simulation-based experiments' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper refers to standard algorithms like Lin UCB and Robust Phase Elimination but does not specify any programming languages, libraries, or software environments with version numbers used for the implementation or experiments. |
| Experiment Setup | Yes | In our simulations, we generate a size-k arm pool A, in which each arm a is associated with a context vector xa Rd. Each dimension of xa is drawn from a set of zero-mean Gaussian distributions with variances sampled from a uniform distribution U(0, 1). Each xa is then normalized to xa 2 = 1. The bandit model parameter θ is sampled from N(0, 1) and normalized to θ 2 = 1. We set d to 10, the standard derivation σ of Gaussian noise ηt and ηt to 0.1, and the arm pool size k to 30 in our simulations. We run the experiment for T = 10, 000 iterations. and Specifically, when T1 = T 1/2, the strategy gives the minimum attack cost in the order of O(T 3/4), and the non-target arms are pulled at most O(T 3/4) rounds. and For any large enough T1, the attack strategy in Algorithm 1 will correctly assert the attackability with high probability. Moreover, when the environment is attackable, with probability at least 1 2δ the attack strategy will fool Robust Ph E to pull non-target arms at most T log(T/δ) + d T log(T) log(1/δ)/ T1 + T 2 1 /ϵ rounds. and Specifically, setting T1 = T 2/5 gives the minimum attack cost order O(T 4/5) and the non-target arms are pulled at most O(T 4/5) rounds. |