Provable Algorithms for Inference in Topic Models

Authors: Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models.
Researcher Affiliation Academia Sanjeev Arora ARORA@CS.PRINCETON.EDU Department of Computer Science, Princeton University Rong Ge RONGGE@CS.DUKE.EDU Computer Science Department, Duke Unversity Frederic Koehler FKOEHLER@PRINCETON.EDU Department of Mathematics, Princeton University Tengyu Ma TENGYU@CS.PRINCETON.EDU Department of Computer Science, Princeton University Ankur Moitra MOITRA@MIT.EDU Department of Mathematics and CSAIL, Massachusetts Institute of Technology
Pseudocode Yes Algorithm 1 Thresholded Linear Inverse Algorithm (TLI)
Open Source Code Yes Code to reproduce the results is available at: https:// github.com/frytvm/topic-inference
Open Datasets No The paper uses 'New York Times articles', 'Enron emails', and 'NIPS papers' but does not provide explicit access information (link, DOI, repository) or a specific citation for the datasets themselves.
Dataset Splits No The paper describes how synthetic data was generated and evaluated, and mentions using 'a subsample of real documents', but it does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset.
Hardware Specification No The paper mentions 'Solving LP (3) on 16 processors' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions using the 'Mosek LP solver' and 'MALLET (Mc Callum, 2002)' but does not specify version numbers for these software dependencies.
Experiment Setup Yes For each document, we sample r = 5 topics uniformly at random, and choose weights for these topics uniformly from the r-dimensional probability simplex.