reproducibilityindex.ai

Parallel Sampling of HDPs using Sub-Cluster Splits

Authors: Jason Chang, John W. Fisher III

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on synthetic and real-world data validate the improved convergence of the proposed method.
Researcher Affiliation	Academia	Jason Chang CSAIL, MIT jchang7@csail.mit.edu John W. Fisher III CSAIL, MIT fisher@csail.mit.edu
Pseudocode	Yes	Algorithm 1 Split-Merge Framework 1. Propose assignments, ˆz, global proportions, ˆβ, document proportions, ˆπ, and parameters, ˆθ. 2. Defer the proposal of auxiliary variables to the restricted sampling of Equations (1 10). 3. Accept/reject the proposal with the Hastings ratio.
Open Source Code	Yes	All results are averaged over 10 sample paths. Source code can be downloaded from http://people.csail.mit.edu/jchang7.
Open Datasets	Yes	Next, we consider the Associated Press (AP) dataset [23] with 436K words in 2K documents. ... Finally, we consider two large datasets from [24]: Enron Emails with 6M words in 40K documents and NYTimes Articles with 100M words in 300K documents.
Dataset Splits	No	The paper discusses 'cross-validation techniques' and 'held-out word' but does not specify train/validation/test dataset splits with percentages, absolute counts, or reference to predefined splits for reproducibility.
Hardware Specification	No	Results using 16 cores (except DA, which cannot be parallelized) with 1, 25, 50, and 75 initial topics are shown in Figure 4a.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) are mentioned in the paper.
Experiment Setup	No	The paper states: '(1) initialize β and z randomly; (2) sample π, θ, π, and θ via Equations (2, 3, 7, 8); (3) sample z and z via Equations (4, 9); (4) propose K/2 local merges followed by K local splits; (5) propose a global merge followed by a global split; (6) sample m and m via Equations (5, 10); (7) sample β and β via Equations (1, 6); (8) repeat from Step 2 until convergence. We fix the hyper-parameters, but resampling techniques [2] can easily be incorporated.' However, specific hyperparameter values (e.g., learning rates, batch sizes) are not provided.