EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

Authors: Qi Zhou, Haipeng Chen, Yitao Zheng, Zhen Wang14602-14611

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the effectiveness of Eva LDA via extensive empirical evaluations. For instance, in the NIPS dataset, Eva LDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our contributions... iv) We conduct extensive empirical evaluations that prove the effectiveness of Eva LDA on two distinct datasets (i.e., NIPS and AP3) and a large variety of problem settings. We conduct empirical experiments to evaluate Eva LDA.
Researcher Affiliation Academia Qi Zhou,1 Haipeng Chen, 2 Yitao Zheng, 1 Zhen Wang 1 1 School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China 2 Center for Research on Computation and Society & Department of Computer Science, Harvard University, Cambridge 02138, MA, USA zhouqi@hdu.edu.cn, hpchen@seas.harvard.edu, zhengyitao@hdu.edu.cn, wangzhen@hdu.edu.cn
Pseudocode Yes Algorithm 1: Eva LDA
Open Source Code Yes The code of this paper can be found at https://github.com/tools-only/Evasion Attack-against-LDA-Model.
Open Datasets Yes We evaluate Eva LDA on 2 different datasets, NIPS8 and AP9. 8https://www.kaggle.com/benhamner/nips-papers 9https://github.com/Blei-Lab/lda-c/blob/master/example/ap.tgz
Dataset Splits No The paper provides '#Train docs' and '#Test docs' statistics in Table 1 but does not specify a validation set or explicit percentages for training/validation/test splits, nor does it refer to standard predefined splits for these purposes.
Hardware Specification Yes All experiments are run in machines with Intel E5-2678 v3 and 100GB RAM.
Software Dependencies No The paper mentions 'We implement LDA-CGS using the lda package' but does not provide specific version numbers for this package or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes The hyperparameters of the two datasets are set as follows: the topic number is 120 for NIPS dataset, and 75 for AP dataset. The training iteration is 5, 000 which is enough to converge. We set the hyperparameters α and η of the Dirichlet distribution as default values 0.1 and 0.01. Each test sample runs 500 iterations. For all settings, we set word distance threshold σ = 0.6. Perturbation threshold κ (in Eq.(7)), ranges over [0.5%, 1%, 2%, 3%] (default κ = 1%). Original rank of target topic, ranges over [5, 10, 15, 20] (default 10).