Black-Box Data Poisoning Attacks on Crowdsourcing
Authors: Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments This section presents our experimental results for evaluating the effectiveness of the attack strategy obtained by our proposed approach Sub Pac 2. Specifically, we answer the following questions: Q1: How well does the attack strategy perform in subverting the label aggregation result with varying proportion of malicious labels? Q2: How well does the attack strategy perform in disguising malicious behaviors in gold test? Q3: How effective is the attack strategy in attacking different target models and which are good substitute models? Q4: How effective is the proposed attack strategy perform with limited accessibility to normal labels? |
| Researcher Affiliation | Academia | Pengpeng Chen1,2,5 , Yongqiang Yang2,4 , Dingqi Yang3 , Hailong Sun 2,4 , Zhijun Chen2,4 , Peng Lin1,5 1China s Aviation System Engineering Research Institute, Beijing, China 2SKLSDE Lab, Beihang University, Beijing, China 3State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science, University of Macau, Macau SAR, China 4Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China 5Chinese Aeronautical Establishment, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Sub Pac |
| Open Source Code | Yes | Our code is available at https://github.com/yongqiangyang/Sub Pac. |
| Open Datasets | Yes | We experiment with the following four real-world datasets. 1) Temp [Snow et al., 2008]:... 2) rte [Snow et al., 2008]:... 3) sentiment [Zheng et al., 2017]:... 4) ER [Wang et al., 2012]:... |
| Dataset Splits | No | The paper describes how malicious data is generated and integrated with normal labels but does not provide explicit training, validation, or test dataset splits in typical machine learning terms (e.g., percentages or counts for distinct sets used for training/validation/testing phases of a model). |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU or GPU models used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiments. |
| Experiment Setup | Yes | Details of parameter settings. The budget B possessed by the attacker is set to M N, where M is the number of malicious workers and N is the number of instances labeled by each malicious worker. Each element of T is initialized by 1. Each element of Y is initialized as a random option different from aggregated result. We consider the scenario where the proportion of malicious workers is very low, namely M 5. The attacker has a limited budget; thus we set N 0.5 N, where N is the number of instances. We use cross-entropy for the discrepancy function v(p, q). |