Topic Modeling with Document Relative Similarities
Authors: Jianguang Du, Jing Jiang, Dandan Song, Lejian Liao
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task. |
| Researcher Affiliation | Academia | School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China School of Information Systems, Singapore Management University, Singapore |
| Pseudocode | Yes | Algorithm 1 Gibbs-EM for our model. |
| Open Source Code | No | The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." and provides a footnote for sLDA's code. It does not provide any link or explicit statement about the open-source availability of their own proposed method's code. |
| Open Datasets | Yes | We use two widely used text corpora, 20 newspapers1 and TDT2 [Cai et al., 2008]. The 20 newsgroups text corpus is a collection of approximately 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) 1http://qwone.com/ jason/20Newsgroups/ 2http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html |
| Dataset Splits | Yes | We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) To test the robustness of our model, we used 5-fold cross validation for all methods. |
| Hardware Specification | No | Experiments were run on a machine with 4 cores and 4GB of memory. This description does not provide specific hardware models (e.g., CPU, GPU) or detailed specifications needed for replication. |
| Software Dependencies | No | The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." While programming languages are mentioned, no specific software libraries or their version numbers are provided for their method or the baselines. |
| Experiment Setup | Yes | In each run, we ran 100 iterations of Gibbs sampling and another 10 iterations of gradient descent. We set the Dirichlet prior β = 0.1, the variance of Gaussian prior σ = 1. We also fixed the number of topics to be 20 (same as the number of categories in each dataset). |