reproducibilityindex.ai

The Inverse Regression Topic Model

Authors: Maxim Rabinovich, David Blei

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply these methods to a corpus of 73K Congressional press releases and another of 150K Yelp reviews, demonstrating that the IRTM outperforms both MNIR and supervised topic models on the prediction task.
Researcher Affiliation	Academia	Maxim Rabinovich MR608@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ David M. Blei BLEI@CS.PRINCETON.EDU Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544
Pseudocode	No	The paper describes algorithmic steps and updates with mathematical equations, but it does not include a formal pseudocode block or a clearly labeled 'Algorithm' section.
Open Source Code	No	The paper mentions 'open-source code is available' for supervised topic modeling approaches used for comparison, but it does not state that its own IRTM code is open-source or provide a link to it.
Open Datasets	Yes	The Amazon corpus... was obtained from the raw Multi-Domain Sentiment Dataset (Blitzer et al., 2007). The Yelp corpus... came from the Yelp Academic Dataset (Yelp, Inc., 2012).
Dataset Splits	Yes	In all cases, we ﬁt to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%).
Hardware Specification	No	The paper does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., 'Python 3.8, PyTorch 1.9') needed for replication.
Experiment Setup	Yes	In all cases, we ﬁt to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%). These plots show that the IRTM s success is due to its use of topics; the best results on the complex press release corpora, for instance, came with K 20. Further, the Press Releases (All) learning curve versus λ shows that a default value of λ = 1.0 fares about as well as the best choice of λ. Results with the other corpora were similar. ... Default values of λ = 1.0, α = 0.1, and η = 0.01 were used in all of these experiments