reproducibilityindex.ai

Topic Modeling with Document Relative Similarities

Authors: Jianguang Du, Jing Jiang, Dandan Song, Lejian Liao

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classiﬁcation task.
Researcher Affiliation	Academia	School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China School of Information Systems, Singapore Management University, Singapore
Pseudocode	Yes	Algorithm 1 Gibbs-EM for our model.
Open Source Code	No	The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." and provides a footnote for sLDA's code. It does not provide any link or explicit statement about the open-source availability of their own proposed method's code.
Open Datasets	Yes	We use two widely used text corpora, 20 newspapers1 and TDT2 [Cai et al., 2008]. The 20 newsgroups text corpus is a collection of approximately 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) 1http://qwone.com/ jason/20Newsgroups/ 2http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html
Dataset Splits	Yes	We used a preprocessed version of this dataset2, where the documents are divided into a training set and a test set. (...) To test the robustness of our model, we used 5-fold cross validation for all methods.
Hardware Specification	No	Experiments were run on a machine with 4 cores and 4GB of memory. This description does not provide specific hardware models (e.g., CPU, GPU) or detailed specifications needed for replication.
Software Dependencies	No	The paper states: "The version of s LDA we used was implemented in C++3 and our method was implemented in Java." While programming languages are mentioned, no specific software libraries or their version numbers are provided for their method or the baselines.
Experiment Setup	Yes	In each run, we ran 100 iterations of Gibbs sampling and another 10 iterations of gradient descent. We set the Dirichlet prior β = 0.1, the variance of Gaussian prior σ = 1. We also fixed the number of topics to be 20 (same as the number of categories in each dataset).