reproducibilityindex.ai

Discriminative Topic Modeling with Logistic LDA

Authors: Iryna Korshunova, Hanchen Xiong, Mateusz Fedoryszak, Lucas Theis

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments While a lot of research has been done on models related to LDA, benchmarks have almost exclusively focused on either document classiﬁcation or on a generative model s perplexity. However, here we are not only interested in logistic LDA s ability to discover the topics of documents but also those of individual items, as well as its ability to handle arbitrary types of inputs. We therefore explore two new benchmarks. First, we are going to look at a model s ability to discover the topics of tweets. Second, we are going to evaluate a model s ability to predict the categories of boards on Pinterest based on images. To connect with the literature on topic models and document classiﬁers, we are going to show that logistic LDA can also work well when applied to the task of document classiﬁcation.
Researcher Affiliation	Collaboration	Iryna Korshunova Ghent University iryna.korshunova@ugent.be Hanchen Xiong Twitter hxiong@twitter.com Mateusz Fedoryszak Twitter mfedoryszak@twitter.com Lucas Theis Twitter ltheis@twitter.com
Pseudocode	Yes	Algorithm 1 Single step of discriminative training for a collection {xdn}Nd n=1 with class label cd.
Open Source Code	Yes	Our code and datasets are available at github.com/lucastheis/logistic_lda.
Open Datasets	Yes	Our code and datasets are available at github.com/lucastheis/logistic_lda. ... For our purpose, we used a subset of the Pinterest dataset of Geng et al. [11] ... We apply logistic LDA with discrimintive training (Section 5.2) to the standard benchmark problem of document classiﬁcation on the 20-Newsgroups dataset [21].
Dataset Splits	Yes	We split the ﬁrst dataset into training (70%), validation (10%), and test (20%) sets such that tweets of each author were only contained in one of the sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like PyTorch 1.9 or CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	Tweets were embedded by averaging 300-dimensional skip-gram embeddings of words [26]. Logistic LDA applied a shallow MLP on top of these embeddings and was trained using a stochastic approximation to mean-ﬁeld variational inference (Section 5.1). ... The hyperparameters were selected based on a 15% split from the training data and are listed in Appendix E.