reproducibilityindex.ai

Grounding Topic Models with Knowledge Bases

Authors: Zhiting Hu, Gang Luo, Mrinmaya Sachan, Eric Xing, Zaiqing Nie

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of our approach. LGSA improves topic quality in terms of perplexity signiﬁcantly. We apply the model to identity key entities of documents (e.g., the dominant ﬁgures of a news article). LGSA achieves 10% improvement (precision@1 from 80% to 90%) over the best performing competitors, showing strong potential in semantic search and knowledge acquisition.
Researcher Affiliation	Collaboration	Microsoft Research, Beijing, China Microsoft, California, USA School of Computer Science, Carnegie Mellon University {zhitingh,mrinmays,epxing}@cs.cmu.com, {gluo,znie}@microsoft.com
Pseudocode	Yes	Algorithm 1 Generative Process for LGSA
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that the code for the methodology is released.
Open Datasets	Yes	NYT news is a widely-used large corpus from LDC1. For both datasets, we extract the mentions of each article using a mention annotation tool The Wiki Machine2. We use the Wikipedia snapshot of 04/02/2014 as our KB. Footnote 1: https://www.ldc.upenn.edu
Dataset Splits	Yes	We use 5-fold cross validation testing.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper mentions 'The Wiki Machine' as a tool, but does not provide version numbers for any software dependencies.
Experiment Setup	Yes	The Dirichlet hyperparameters are set as ﬁxed values: = 50/K, β = 0.01, a common setting in topic modeling. We investigate the effects of λ and λ in our empirical studies.