reproducibilityindex.ai

A Correlated Topic Model Using Word Embeddings

Authors: Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, Aidong Zhang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, SUNY at Buffalo, NY, USA 2School of Information, Renmin University of China, Beijing, China 3Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode	No	The paper describes the generative process and parameter inference steps in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	In this section, we carry out experiments on two realworld text collections the 20 Newsgroups dataset1 and the Reuters-21578 dataset2. 1www.qwone.com/ jason/20Newsgroups/ 2www.daviddlewis.com/resources/testcollections/reuters21578/
Dataset Splits	No	The paper mentions using the 20 Newsgroups and Reuters-21578 datasets, but it does not specify train, validation, or test splits (e.g., percentages or sample counts) for reproducibility. It implies the datasets are used for evaluation as a whole for tasks like topic coherence and document clustering.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	The paper mentions using Word2Vec, but it does not provide specific version numbers for Word2Vec or any other software libraries or dependencies used in the implementation.
Experiment Setup	Yes	In the experiment, we set the dimensionality of word embeddings to 100, and the context window size to 12. We train word embeddings for 100 epochs. For uniformity, all the models are implemented with Gibbs sampling and run for 100 iterations. The Gaussian topic hyper parameter µ0 is set to the sample mean of all the word vectors, the initial degree of freedom ν0 to the dimensionality of word embeddings, and Ψ0 to an identity matrix. We set the number of topics K to the number of categories.