Topic Modeling with Document Relative Similarities

Authors: Jianguang Du, Jing Jiang, Dandan Song, Lejian Liao

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.
Researcher Affiliation Academia School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China School of Information Systems, Singapore Management University, Singapore
Pseudocode Yes Algorithm 1 Gibbs-EM for our model.
Open Source Code No The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." and provides a footnote for sLDA's code. It does not provide any link or explicit statement about the open-source availability of their own proposed method's code.
Open Datasets Yes We use two widely used text corpora, 20 newspapers1 and TDT2 [Cai et al., 2008]. The 20 newsgroups text corpus is a collection of approximately 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) 1http://qwone.com/ jason/20Newsgroups/ 2http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html
Dataset Splits Yes We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) To test the robustness of our model, we used 5-fold cross validation for all methods.
Hardware Specification No Experiments were run on a machine with 4 cores and 4GB of memory. This description does not provide specific hardware models (e.g., CPU, GPU) or detailed specifications needed for replication.
Software Dependencies No The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." While programming languages are mentioned, no specific software libraries or their version numbers are provided for their method or the baselines.
Experiment Setup Yes In each run, we ran 100 iterations of Gibbs sampling and another 10 iterations of gradient descent. We set the Dirichlet prior β = 0.1, the variance of Gaussian prior σ = 1. We also fixed the number of topics to be 20 (same as the number of categories in each dataset).