reproducibilityindex.ai

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Authors: Aritra Mitra, Arman Adibi, George J. Pappas, Hamed Hassani

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report simulation results on synthetic data to corroborate our developed theory. Additional simulations on contextual bandits and alternate attack models are presented in Appendix H. Experimental Setup. We consider a setting with 50 actions in R5; we describe how these actions and θ are generated in Appendix H. The rewards are generated based on the observation model in Eq. (1). We now describe the attack model. To manipulate the server into selecting sub-optimal arms, each adversarial agent i employs the simple strategy of reducing the rewards of the good arms and increasing the rewards of the bad arms. More precisely, in each epoch ℓ, upon pulling an arm a and observing the corresponding reward ya, an adversarial agent i does the following: if ya > p θ , a , then this reward is corrupted to ya = ya β; and if ya p θ , a , then the reward is corrupted to ya = ya + β. For this experiment, we fix p = 0.6 and β = 5. Agent i B then uses all the corrupted rewards in epoch ℓto generate the local model estimate ˆθ(ℓ) i that is transmitted to the server. Discussion of Simulation Results. Fig. 1 summarizes our experimental results. In Fig. 1(a), we compare our proposed algorithm RCLB to a vanilla distributed phased elimination (PE) algorithm that does not account for adversarial agents. Specifically, the latter is designed by replacing the median operation in line 8 of Algorithm 1 with a mean operation, and setting the threshold γℓin line 9 to be ϵℓ. Fig. 1(a) shows that even a small fraction α = 0.1 of adversaries can cause the non-robust PE algorithm to incur linear regret. In contrast, RCLB continues to guarantee sub-linear regret bounds despite adversarial corruptions. Furthermore, the regret bound of RCLB in the presence of a small fraction of adversarial agents is close to that of the non-robust algorithm in the absence of adversaries. This goes on to establish the robustness of RCLB.
Researcher Affiliation	Academia	Aritra Mitra Arman Adibi George J. Pappas Hamed Hassani Department of Electrical and Systems Engineering {amitra20, aadibi, hassani, pappasg}@seas.upenn.edu
Pseudocode	Yes	Algorithm 1 Robust Collaborative Phased Elimination for Linear Bandits (RCLB) Algorithm 2 Robust Base Lin UCB (at Server) Algorithm 3 Robust Collaborative Sup Lin UCB for Contextual Bandits (at Server)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper uses "synthetic data" generated for the experiments and describes the process of generating this data in Appendix H, but it does not specify that this data is publicly available or provide concrete access information (link, DOI, citation for an established dataset).
Dataset Splits	No	The paper discusses an experimental setup using synthetic data but does not specify dataset splits (e.g., training, validation, test percentages or counts) or cross-validation setup.
Hardware Specification	No	The paper describes the experimental setup but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper mentions simulation results but does not specify any software dependencies with version numbers (e.g., Python, specific libraries, or solvers).
Experiment Setup	Yes	Experimental Setup. We consider a setting with 50 actions in R5; we describe how these actions and θ are generated in Appendix H. The rewards are generated based on the observation model in Eq. (1). We now describe the attack model. To manipulate the server into selecting sub-optimal arms, each adversarial agent i employs the simple strategy of reducing the rewards of the good arms and increasing the rewards of the bad arms. More precisely, in each epoch ℓ, upon pulling an arm a and observing the corresponding reward ya, an adversarial agent i does the following: if ya > p θ , a , then this reward is corrupted to ya = ya β; and if ya p θ , a , then the reward is corrupted to ya = ya + β. For this experiment, we fix p = 0.6 and β = 5. Agent i B then uses all the corrupted rewards in epoch ℓto generate the local model estimate ˆθ(ℓ) i that is transmitted to the server.