On-the-fly Rectification for Robust Large-Vocabulary Topic Inference
Authors: Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that applying LAW after ENN learns topics of quality comparable to using AW after AP based on the full co-occurrence. In the next two series of experiments, we demonstrate that our simultaneous rectification and compression maintains model quality while running in a fraction of the space and time needed for the original JSMF framework. |
| Researcher Affiliation | Collaboration | 1Information and Decision Sciences, University of Illinois at Chicago, Chicago, Illinois, USA (also affiliated in Microsoft Research at Redmond, Redmond, Washington, USA) 2Computational Science and Engineering, Georgia Tech, Atlanta, Georgia, USA 3Applied Mathematics, Cornell University, Ithaca, New York, USA 4Information Science, Cornell University, Ithaca, New York, USA 5Computer Science, Cornell University, Ithaca, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Anchor Word algorithm (AW), Algorithm 2 Rectified AW algorithm (RAW), Algorithm 3 ENN-rectification (ENN), Algorithm 4 PALM-rectification (PALM), Algorithm 5 Low-rank AW (LAW), Algorithm 6 Low-rank JSMF (LR-JSMF). |
| Open Source Code | Yes | The code is publicly available.5 [Footnote 5: https://github.com/moontae/JSMF] |
| Open Datasets | Yes | We evaluate on real data: two standard textual datasets from the UCI Machine Learning repository (Neur IPS papers and New York Times articles) as well as two non-textual datasets (Movies from Movielens 10M star-ratings and Songs from Yes.com complete playlists) previously used to show the performance of JSMF with AP in (Lee et al., 2015). |
| Dataset Splits | No | The paper mentions using standard datasets but does not provide specific details about training, validation, or test splits (e.g., percentages, sample counts, or explicit methodology for partitioning data) needed for reproduction. |
| Hardware Specification | No | The paper mentions running on 'laptop-grade hardware' in the conclusion, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Next we compress C into YENN and YP ALM by running ENN (with |I| = 10K + 1000) and PALM (with s = 1e 4) until convergence. |