On-the-fly Rectification for Robust Large-Vocabulary Topic Inference

Authors: Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that applying LAW after ENN learns topics of quality comparable to using AW after AP based on the full co-occurrence. In the next two series of experiments, we demonstrate that our simultaneous rectification and compression maintains model quality while running in a fraction of the space and time needed for the original JSMF framework.
Researcher Affiliation Collaboration 1Information and Decision Sciences, University of Illinois at Chicago, Chicago, Illinois, USA (also affiliated in Microsoft Research at Redmond, Redmond, Washington, USA) 2Computational Science and Engineering, Georgia Tech, Atlanta, Georgia, USA 3Applied Mathematics, Cornell University, Ithaca, New York, USA 4Information Science, Cornell University, Ithaca, New York, USA 5Computer Science, Cornell University, Ithaca, New York, USA.
Pseudocode Yes Algorithm 1 Anchor Word algorithm (AW), Algorithm 2 Rectified AW algorithm (RAW), Algorithm 3 ENN-rectification (ENN), Algorithm 4 PALM-rectification (PALM), Algorithm 5 Low-rank AW (LAW), Algorithm 6 Low-rank JSMF (LR-JSMF).
Open Source Code Yes The code is publicly available.5 [Footnote 5: https://github.com/moontae/JSMF]
Open Datasets Yes We evaluate on real data: two standard textual datasets from the UCI Machine Learning repository (Neur IPS papers and New York Times articles) as well as two non-textual datasets (Movies from Movielens 10M star-ratings and Songs from Yes.com complete playlists) previously used to show the performance of JSMF with AP in (Lee et al., 2015).
Dataset Splits No The paper mentions using standard datasets but does not provide specific details about training, validation, or test splits (e.g., percentages, sample counts, or explicit methodology for partitioning data) needed for reproduction.
Hardware Specification No The paper mentions running on 'laptop-grade hardware' in the conclusion, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Next we compress C into YENN and YP ALM by running ENN (with |I| = 10K + 1000) and PALM (with s = 1e 4) until convergence.