reproducibilityindex.ai

Document Informed Neural Autoregressive Topic Models with Distributional Prior

Authors: Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze6505-6512

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classiﬁcation) over 7 long-text and 8 short-text datasets from diverse domains.
Researcher Affiliation	Collaboration	1Corporate Technology, Machine-Intelligence (MIC-DE), Siemens AG Munich, Germany 2CIS, University of Munich (LMU) Munich, Germany
Pseudocode	Yes	Algorithm 1 Computation of log ppvq in i Doc NADE or i Doc NADEe using tree-softmax; Algorithm 2 Computing gradients of log ppvq in i Doc NADE or i Doc NADEe using tree-softmax
Open Source Code	Yes	Code and supplementary material are available at https://github.com/pgcool/iDocNADEe.
Open Datasets	Yes	We perform evaluations on 15 (8 short-text and 7 long-text) datasets of varying size with single/multi-class labeled documents from public as well as industrial corpora. See the supplementary material for the data description, hyperparameters and grid-search results for generalization and IR tasks. Table 1 shows the data statistics, where 20NS: 20News Groups and R21578: Reuters21578.
Dataset Splits	Yes	Table 1: Data statistics of short and long texts as well as small and large corpora from various domains. State-of-the-art comparison in terms of PPL and IR (i.e, IR-precision) for short and long text datasets. The symbols are L: average text length in number of words, K:dictionary size, C: number of classes, Senti: Sentiment, Avg: average, k :thousand and :: multi-label data. PPL and IR (IR-precision) are computed over 200 (T200) topics at retrieval fraction = 0.02. For short-text, L ă 25. The underline and bold numbers indicate the best scores in PPL and retrieval task, respectively in FS setting. See Larochelle and Lauly (2012) for LDA (Blei, Ng, and Jordan 2003) performance in terms of PPL, where Doc NADE outperforms LDA. (Columns for Train, Val, Test are provided in the table)
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	No	See the supplementary material for the data description, hyperparameters and grid-search results for generalization and IR tasks.