On Privacy Protection of Latent Dirichlet Allocation Model Training

Authors: Fangyuan Zhao, Xuebin Ren, Shusen Yang, Xinyu Yang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms. We conduct experiments on several real-world datasets to demonstrate the effectiveness of our proposed algorithms.
Researcher Affiliation Academia 1School of Computer Science and Technology, Xi an Jiaotong University, China 2National Engineering Laboratory for Big Data Analytics, Xi an Jiaotong University, China 3Ministry of Education Key Lab For Intelligent Networks and Network Security, Xi an Jiaotong University, China
Pseudocode Yes Algorithm 1 Privacy Monitoring for Each Sampling; Algorithm 2 Privacy Monitoring for CGS in LDA
Open Source Code No The paper does not include an unambiguous statement or a direct link to the source code for the methodology described in the paper. It only references a full version of the paper on arXiv.
Open Datasets Yes The datasets used in our experiment are: KOS1: contains 3430 blog entries from dailykos website. NIPS2: contains 1740 research papers from NIPS conference. Enron3: contains 0.5 million email messages from about 150 users. 1http://archive.ics.uci.edu/ml/ 2http://nips.djvuzone.org/txt.html 3www.cs.cmu.edu/ enron
Dataset Splits No The paper mentions training and test sets but does not specify a validation set or any splits for one. "We extracted part of these datasets as our training datasets and the rest as the testsets." Table 1 provides "#. training docs" and "#. test docs" but no validation split.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., specific GPU/CPU models, memory specifications).
Software Dependencies No The paper does not provide specific version numbers for any software components or libraries used in the experiments, which are necessary for reproducibility.
Experiment Setup Yes In our experiments, for all datasets, the topic number is set as 50, the maximum iteration number of CGS process in LDA model training is set as 300, which is sufficient for convergence on all three datasets. The hyper parameters α and β are set as 0.1, 0.01, respectively.