reproducibilityindex.ai

Provable Algorithms for Inference in Topic Models

Authors: Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models.
Researcher Affiliation	Academia	Sanjeev Arora ARORA@CS.PRINCETON.EDU Department of Computer Science, Princeton University Rong Ge RONGGE@CS.DUKE.EDU Computer Science Department, Duke Unversity Frederic Koehler FKOEHLER@PRINCETON.EDU Department of Mathematics, Princeton University Tengyu Ma TENGYU@CS.PRINCETON.EDU Department of Computer Science, Princeton University Ankur Moitra MOITRA@MIT.EDU Department of Mathematics and CSAIL, Massachusetts Institute of Technology
Pseudocode	Yes	Algorithm 1 Thresholded Linear Inverse Algorithm (TLI)
Open Source Code	Yes	Code to reproduce the results is available at: https:// github.com/frytvm/topic-inference
Open Datasets	No	The paper uses 'New York Times articles', 'Enron emails', and 'NIPS papers' but does not provide explicit access information (link, DOI, repository) or a specific citation for the datasets themselves.
Dataset Splits	No	The paper describes how synthetic data was generated and evaluated, and mentions using 'a subsample of real documents', but it does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset.
Hardware Specification	No	The paper mentions 'Solving LP (3) on 16 processors' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions using the 'Mosek LP solver' and 'MALLET (Mc Callum, 2002)' but does not specify version numbers for these software dependencies.
Experiment Setup	Yes	For each document, we sample r = 5 topics uniformly at random, and choose weights for these topics uniformly from the r-dimensional probability simplex.