reproducibilityindex.ai

Statistical Model Aggregation via Parameter Matching

Authors: Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments: We begin with a correctness veriﬁcation of our inference procedure via a simulated experiment. We randomly sample L = 50 global centroids θi R50 from a Gaussian distribution θi N(µ0, σ2 0I). We then simulate j = 1, . . . , J heterogeneous datasets by picking a random subset of global centroids and adding white noise with variance σ2 to obtain the true local centroids, {vjl}Lj l=1 (following generative process in Section 3 with Gaussian densities). Then each dataset is sampled from a Gaussian mixture model with the corresponding set of centroids.
Researcher Affiliation	Collaboration	IBM Research,1 MIT-IBM Watson AI Lab,2 Center for Computational Health3.
Pseudocode	Yes	Algorithm 1 Statistical Parameter Aggregation via Heterogeneous Matching (SPAHM)
Open Source Code	Yes	1Code: https://github.com/IBM/SPAHM
Open Datasets	Yes	learning Gaussian topic models [7] where local topic models are learned from the Gutenberg dataset comprising 40 books. ... For this task, we utilize the GSOD data available from the National Oceanic and Atmospheric Administration2 containing the daily global surface weather summary from over 9000 stations across the world.
Dataset Splits	No	The paper mentions 'test set' for evaluation (e.g., 'evaluate it on the test set consisting of a random subset'), but does not explicitly define or mention a 'validation' split or specific train/validation/test percentages/counts.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). While it mentions methods and algorithms, it doesn't list the specific software packages and their versions used for implementation.
Experiment Setup	Yes	We use basic k-means with k = 25 to cluster word embeddings of words present in a book to obtain local topics and then apply SPAHM resulting in 155 topics. ... We used memoized variational inference [24] with random restarts and merge moves to alleviate local optima issues (see supplement for details about parameter settings and data pre-processing).