The Inverse Regression Topic Model
Authors: Maxim Rabinovich, David Blei
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these methods to a corpus of 73K Congressional press releases and another of 150K Yelp reviews, demonstrating that the IRTM outperforms both MNIR and supervised topic models on the prediction task. |
| Researcher Affiliation | Academia | Maxim Rabinovich MR608@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ David M. Blei BLEI@CS.PRINCETON.EDU Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544 |
| Pseudocode | No | The paper describes algorithmic steps and updates with mathematical equations, but it does not include a formal pseudocode block or a clearly labeled 'Algorithm' section. |
| Open Source Code | No | The paper mentions 'open-source code is available' for supervised topic modeling approaches used for comparison, but it does not state that its own IRTM code is open-source or provide a link to it. |
| Open Datasets | Yes | The Amazon corpus... was obtained from the raw Multi-Domain Sentiment Dataset (Blitzer et al., 2007). The Yelp corpus... came from the Yelp Academic Dataset (Yelp, Inc., 2012). |
| Dataset Splits | Yes | In all cases, we fit to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%). |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., 'Python 3.8, PyTorch 1.9') needed for replication. |
| Experiment Setup | Yes | In all cases, we fit to a training set (80% of the data), optimized the number of topics on an evaluation set (10%), and assessed on a test set (10%). These plots show that the IRTM s success is due to its use of topics; the best results on the complex press release corpora, for instance, came with K 20. Further, the Press Releases (All) learning curve versus λ shows that a default value of λ = 1.0 fares about as well as the best choice of λ. Results with the other corpora were similar. ... Default values of λ = 1.0, α = 0.1, and η = 0.01 were used in all of these experiments |