A Correlated Topic Model Using Word Embeddings

Authors: Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, Aidong Zhang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, SUNY at Buffalo, NY, USA 2School of Information, Renmin University of China, Beijing, China 3Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode No The paper describes the generative process and parameter inference steps in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes In this section, we carry out experiments on two realworld text collections the 20 Newsgroups dataset1 and the Reuters-21578 dataset2. 1www.qwone.com/ jason/20Newsgroups/ 2www.daviddlewis.com/resources/testcollections/reuters21578/
Dataset Splits No The paper mentions using the 20 Newsgroups and Reuters-21578 datasets, but it does not specify train, validation, or test splits (e.g., percentages or sample counts) for reproducibility. It implies the datasets are used for evaluation as a whole for tasks like topic coherence and document clustering.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper mentions using Word2Vec, but it does not provide specific version numbers for Word2Vec or any other software libraries or dependencies used in the implementation.
Experiment Setup Yes In the experiment, we set the dimensionality of word embeddings to 100, and the context window size to 12. We train word embeddings for 100 epochs. For uniformity, all the models are implemented with Gibbs sampling and run for 100 iterations. The Gaussian topic hyper parameter µ0 is set to the sample mean of all the word vectors, the initial degree of freedom ν0 to the dimensionality of word embeddings, and Ψ0 to an identity matrix. We set the number of topics K to the number of categories.