The Inverse Regression Topic Model

Authors: Maxim Rabinovich, David Blei

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these methods to a corpus of 73K Congressional press releases and another of 150K Yelp reviews, demonstrating that the IRTM outperforms both MNIR and supervised topic models on the prediction task.
Researcher Affiliation Academia Maxim Rabinovich MR608@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ David M. Blei BLEI@CS.PRINCETON.EDU Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544
Pseudocode No The paper describes algorithmic steps and updates with mathematical equations, but it does not include a formal pseudocode block or a clearly labeled 'Algorithm' section.
Open Source Code No The paper mentions 'open-source code is available' for supervised topic modeling approaches used for comparison, but it does not state that its own IRTM code is open-source or provide a link to it.
Open Datasets Yes The Amazon corpus... was obtained from the raw Multi-Domain Sentiment Dataset (Blitzer et al., 2007). The Yelp corpus... came from the Yelp Academic Dataset (Yelp, Inc., 2012).
Dataset Splits Yes In all cases, we fit to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%).
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., 'Python 3.8, PyTorch 1.9') needed for replication.
Experiment Setup Yes In all cases, we fit to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%). These plots show that the IRTM s success is due to its use of topics; the best results on the complex press release corpora, for instance, came with K 20. Further, the Press Releases (All) learning curve versus λ shows that a default value of λ = 1.0 fares about as well as the best choice of λ. Results with the other corpora were similar. ... Default values of λ = 1.0, α = 0.1, and η = 0.01 were used in all of these experiments