reproducibilityindex.ai

Conic Scan-and-Cover algorithms for nonparametric topic modeling

Authors: Mikhail Yurochkin, Aritra Guha, XuanLong Nguyen

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose new algorithms for topic modeling when the number of topics is unknown. Our approach relies on an analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Our algorithms are shown in practice to have accuracy comparable to a Gibbs sampler in terms of topic estimation, which requires the number of topics be given. Moreover, they are one of the fastest among several state of the art parametric techniques.1 Statistical consistency of our estimator is established under some conditions. 5 Experimental results
Researcher Affiliation	Academia	Mikhail Yurochkin Department of Statistics University of Michigan moonfolk@umich.edu Aritra Guha Department of Statistics University of Michigan aritra@umich.edu Xuan Long Nguyen Department of Statistics University of Michigan xuanlong@umich.edu
Pseudocode	Yes	Algorithm 1 Conic Scan-and-Cover (Co SAC) ... Algorithm 2 Co SAC for documents
Open Source Code	Yes	1Code is available at https://github.com/moonfolk/Geometric-Topic-Modeling.
Open Datasets	No	The paper mentions "NYTimes news articles" and generating synthetic data, but does not provide concrete access information (specific link, DOI, formal citation with authors/year, or clear reference to established benchmark datasets with access details) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions general concepts like "training time".
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Remark We found the choices ω = 0.6 and R to be median of { p1 2, . . . , p M 2} to be robust in practice and agreeing with our theoretical results. ... The choice of λ is governed by results of Prop. 4. For small αk = 1/K, k, λ P(Λc) c(K 1)/K (K 1)(1 c) and for an equilateral B we can choose d such that cos(d) = q ... Our approximations were based on large K to get a sense of λ, we now make a conservative choice λ = 0.001 ... Next we compare Co SAC to per iteration quality of the Gibbs sampler trained with 500 iterations for M = 1000 and M = 5000.