reproducibilityindex.ai

On-the-fly Rectification for Robust Large-Vocabulary Topic Inference

Authors: Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that applying LAW after ENN learns topics of quality comparable to using AW after AP based on the full co-occurrence. In the next two series of experiments, we demonstrate that our simultaneous rectiﬁcation and compression maintains model quality while running in a fraction of the space and time needed for the original JSMF framework.
Researcher Affiliation	Collaboration	1Information and Decision Sciences, University of Illinois at Chicago, Chicago, Illinois, USA (also afﬁliated in Microsoft Research at Redmond, Redmond, Washington, USA) 2Computational Science and Engineering, Georgia Tech, Atlanta, Georgia, USA 3Applied Mathematics, Cornell University, Ithaca, New York, USA 4Information Science, Cornell University, Ithaca, New York, USA 5Computer Science, Cornell University, Ithaca, New York, USA.
Pseudocode	Yes	Algorithm 1 Anchor Word algorithm (AW), Algorithm 2 Rectiﬁed AW algorithm (RAW), Algorithm 3 ENN-rectiﬁcation (ENN), Algorithm 4 PALM-rectiﬁcation (PALM), Algorithm 5 Low-rank AW (LAW), Algorithm 6 Low-rank JSMF (LR-JSMF).
Open Source Code	Yes	The code is publicly available.5 [Footnote 5: https://github.com/moontae/JSMF]
Open Datasets	Yes	We evaluate on real data: two standard textual datasets from the UCI Machine Learning repository (Neur IPS papers and New York Times articles) as well as two non-textual datasets (Movies from Movielens 10M star-ratings and Songs from Yes.com complete playlists) previously used to show the performance of JSMF with AP in (Lee et al., 2015).
Dataset Splits	No	The paper mentions using standard datasets but does not provide specific details about training, validation, or test splits (e.g., percentages, sample counts, or explicit methodology for partitioning data) needed for reproduction.
Hardware Specification	No	The paper mentions running on 'laptop-grade hardware' in the conclusion, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	Next we compress C into YENN and YP ALM by running ENN (with \|I\| = 10K + 1000) and PALM (with s = 1e 4) until convergence.