EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation
Authors: Qi Zhou, Haipeng Chen, Yitao Zheng, Zhen Wang14602-14611
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effectiveness of Eva LDA via extensive empirical evaluations. For instance, in the NIPS dataset, Eva LDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our contributions... iv) We conduct extensive empirical evaluations that prove the effectiveness of Eva LDA on two distinct datasets (i.e., NIPS and AP3) and a large variety of problem settings. We conduct empirical experiments to evaluate Eva LDA. |
| Researcher Affiliation | Academia | Qi Zhou,1 Haipeng Chen, 2 Yitao Zheng, 1 Zhen Wang 1 1 School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China 2 Center for Research on Computation and Society & Department of Computer Science, Harvard University, Cambridge 02138, MA, USA zhouqi@hdu.edu.cn, hpchen@seas.harvard.edu, zhengyitao@hdu.edu.cn, wangzhen@hdu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Eva LDA |
| Open Source Code | Yes | The code of this paper can be found at https://github.com/tools-only/Evasion Attack-against-LDA-Model. |
| Open Datasets | Yes | We evaluate Eva LDA on 2 different datasets, NIPS8 and AP9. 8https://www.kaggle.com/benhamner/nips-papers 9https://github.com/Blei-Lab/lda-c/blob/master/example/ap.tgz |
| Dataset Splits | No | The paper provides '#Train docs' and '#Test docs' statistics in Table 1 but does not specify a validation set or explicit percentages for training/validation/test splits, nor does it refer to standard predefined splits for these purposes. |
| Hardware Specification | Yes | All experiments are run in machines with Intel E5-2678 v3 and 100GB RAM. |
| Software Dependencies | No | The paper mentions 'We implement LDA-CGS using the lda package' but does not provide specific version numbers for this package or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | The hyperparameters of the two datasets are set as follows: the topic number is 120 for NIPS dataset, and 75 for AP dataset. The training iteration is 5, 000 which is enough to converge. We set the hyperparameters α and η of the Dirichlet distribution as default values 0.1 and 0.01. Each test sample runs 500 iterations. For all settings, we set word distance threshold σ = 0.6. Perturbation threshold κ (in Eq.(7)), ranges over [0.5%, 1%, 2%, 3%] (default κ = 1%). Original rank of target topic, ranges over [5, 10, 15, 20] (default 10). |