Provable Algorithms for Inference in Topic Models
Authors: Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. |
| Researcher Affiliation | Academia | Sanjeev Arora ARORA@CS.PRINCETON.EDU Department of Computer Science, Princeton University Rong Ge RONGGE@CS.DUKE.EDU Computer Science Department, Duke Unversity Frederic Koehler FKOEHLER@PRINCETON.EDU Department of Mathematics, Princeton University Tengyu Ma TENGYU@CS.PRINCETON.EDU Department of Computer Science, Princeton University Ankur Moitra MOITRA@MIT.EDU Department of Mathematics and CSAIL, Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Thresholded Linear Inverse Algorithm (TLI) |
| Open Source Code | Yes | Code to reproduce the results is available at: https:// github.com/frytvm/topic-inference |
| Open Datasets | No | The paper uses 'New York Times articles', 'Enron emails', and 'NIPS papers' but does not provide explicit access information (link, DOI, repository) or a specific citation for the datasets themselves. |
| Dataset Splits | No | The paper describes how synthetic data was generated and evaluated, and mentions using 'a subsample of real documents', but it does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset. |
| Hardware Specification | No | The paper mentions 'Solving LP (3) on 16 processors' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using the 'Mosek LP solver' and 'MALLET (Mc Callum, 2002)' but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | For each document, we sample r = 5 topics uniformly at random, and choose weights for these topics uniformly from the r-dimensional probability simplex. |