Grounding Topic Models with Knowledge Bases

Authors: Zhiting Hu, Gang Luo, Mrinmaya Sachan, Eric Xing, Zaiqing Nie

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the effectiveness of our approach. LGSA improves topic quality in terms of perplexity significantly. We apply the model to identity key entities of documents (e.g., the dominant figures of a news article). LGSA achieves 10% improvement (precision@1 from 80% to 90%) over the best performing competitors, showing strong potential in semantic search and knowledge acquisition.
Researcher Affiliation Collaboration Microsoft Research, Beijing, China Microsoft, California, USA School of Computer Science, Carnegie Mellon University {zhitingh,mrinmays,epxing}@cs.cmu.com, {gluo,znie}@microsoft.com
Pseudocode Yes Algorithm 1 Generative Process for LGSA
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code for the methodology is released.
Open Datasets Yes NYT news is a widely-used large corpus from LDC1. For both datasets, we extract the mentions of each article using a mention annotation tool The Wiki Machine2. We use the Wikipedia snapshot of 04/02/2014 as our KB. Footnote 1: https://www.ldc.upenn.edu
Dataset Splits Yes We use 5-fold cross validation testing.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions 'The Wiki Machine' as a tool, but does not provide version numbers for any software dependencies.
Experiment Setup Yes The Dirichlet hyperparameters are set as fixed values: = 50/K, β = 0.01, a common setting in topic modeling. We investigate the effects of λ and λ in our empirical studies.