Grounding Topic Models with Knowledge Bases
Authors: Zhiting Hu, Gang Luo, Mrinmaya Sachan, Eric Xing, Zaiqing Nie
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the effectiveness of our approach. LGSA improves topic quality in terms of perplexity significantly. We apply the model to identity key entities of documents (e.g., the dominant figures of a news article). LGSA achieves 10% improvement (precision@1 from 80% to 90%) over the best performing competitors, showing strong potential in semantic search and knowledge acquisition. |
| Researcher Affiliation | Collaboration | Microsoft Research, Beijing, China Microsoft, California, USA School of Computer Science, Carnegie Mellon University {zhitingh,mrinmays,epxing}@cs.cmu.com, {gluo,znie}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Generative Process for LGSA |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that the code for the methodology is released. |
| Open Datasets | Yes | NYT news is a widely-used large corpus from LDC1. For both datasets, we extract the mentions of each article using a mention annotation tool The Wiki Machine2. We use the Wikipedia snapshot of 04/02/2014 as our KB. Footnote 1: https://www.ldc.upenn.edu |
| Dataset Splits | Yes | We use 5-fold cross validation testing. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper mentions 'The Wiki Machine' as a tool, but does not provide version numbers for any software dependencies. |
| Experiment Setup | Yes | The Dirichlet hyperparameters are set as fixed values: = 50/K, β = 0.01, a common setting in topic modeling. We investigate the effects of λ and λ in our empirical studies. |