reproducibilityindex.ai

Model Fusion with Kullback-Leibler Divergence

Authors: Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh, Justin Solomon

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the value of a Bayesian approach to model fusion through applications from topic models to neural networks. 5. Experimental Results
Researcher Affiliation	Collaboration	1CSAIL, MIT, Cambridge, Massachusetts, USA 2MIT-IBM Watson AI Laboratory, Cambridge, Massachusetts, USA 3IBM Research, Cambridge, Massachusetts, USA.
Pseudocode	No	The paper describes the algorithm steps in prose but does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	1Code link: https://github.com/IBM/KL-fusion
Open Datasets	Yes	We measure the effectiveness of our procedure for fusing BNNs locally trained on MNIST digits. ... We use mean-ﬁeld inference with Gaussian Wishart variational distributions to obtain the approximate posterior of a sticky HDP-HMM (Fox et al., 2008), similar to the analogous experiment by Yurochkin et al. (2019a). KLfusion matches activities inferred from each subject. ... Following Campbell & How (2014), we run decentralized variational inference on the latent Dirichlet allocation topic model (Blei et al., 2003). We verify our method against Approximate Merging of Posteriors with Symmetry (AMPS) algorithm from Campbell & How (2014) on the 20 newsgroups dataset, consisting of 18,689 documents with 1,000 held out for testing and a vocabulary of 12,497 words after stemming and stop word removal.
Dataset Splits	No	The paper mentions splitting MNIST data into five partitions and using 1,000 documents from the 20 newsgroups dataset for testing, but it does not specify a distinct validation set or the exact percentages of train/validation/test splits across all experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions various models and inference methods (e.g., Bayesian neural networks, Gaussian mixture models, variational inference) and refers to
Experiment Setup	Yes	For this experiment, we split the MNIST training data into ﬁve partitions at random. We simulate a heterogeneous partitioning of the data by sampling the proportion pk of each class k from a ﬁve-dimensional symmetric Dirichlet distribution with a concentration parameter of 0.8, and allocating a pk,j proportion of the instances of class k to partition j. ... For each dataset we train a single 150-node hidden layer BNN with a horseshoe prior (Ghosh et al., 2019; 2018). Horseshoe is a shrinkage prior allowing BNNs to automatically select the appropriate number of hidden units. We use Gaussian variational distribution with diagonal covariance for the weights of the neurons. Details are presented in the supplement.