reproducibilityindex.ai

Geometric Dirichlet Means Algorithm for topic inference

Authors: Mikhail Yurochkin, XuanLong Nguyen

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The algorithm is evaluated with extensive experiments on simulated and real data.
Researcher Affiliation	Academia	Mikhail Yurochkin Department of Statistics University of Michigan moonfolk@umich.edu Xuan Long Nguyen Department of Statistics University of Michigan xuanlong@umich.edu
Pseudocode	Yes	Algorithm 1 Geometric Dirichlet Means (GDM)
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	NIPS corpora analysis We proceed with the analysis of the NIPS corpus.1 After preprocessing, there are 1738 documents and 4188 unique words. Length of documents ranges from 39 to 1403 with mean of 272. We consider K = 5, 10, 15, 20, α = 5 K , η = 0.1. For each value of K we set aside 300 documents chosen at random to compute the perplexity and average results over 3 repetitions. ... 1https://archive.ics.uci.edu/ml/datasets/Bag+of+Words
Dataset Splits	Yes	The number of held-out documents is 100; results are averaged over 5 repetitions." and "For each value of K we set aside 300 documents chosen at random to compute the perplexity and average results over 3 repetitions.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions runtimes without specifying the hardware they were run on.
Software Dependencies	No	The paper mentions 'R' and 'Python' as programming languages and the 'Hartigan & Wong (1979)' algorithm, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Unless otherwise speciﬁed, we set η = 0.1, α = 0.1, V = 1200, M = 1000, K = 5; Nm = 1000 for each m; the number of held-out documents is 100; results are averaged over 5 repetitions. Since ﬁnding exact solution to the k-means objective is NP hard, we use the algorithm of Hartigan & Wong (1979) with 10 restarts and the k-means++ initialization.