Model Fusion with Kullback-Leibler Divergence

Authors: Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh, Justin Solomon

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the value of a Bayesian approach to model fusion through applications from topic models to neural networks. 5. Experimental Results
Researcher Affiliation Collaboration 1CSAIL, MIT, Cambridge, Massachusetts, USA 2MIT-IBM Watson AI Laboratory, Cambridge, Massachusetts, USA 3IBM Research, Cambridge, Massachusetts, USA.
Pseudocode No The paper describes the algorithm steps in prose but does not include a formal pseudocode or algorithm block.
Open Source Code Yes 1Code link: https://github.com/IBM/KL-fusion
Open Datasets Yes We measure the effectiveness of our procedure for fusing BNNs locally trained on MNIST digits. ... We use mean-field inference with Gaussian Wishart variational distributions to obtain the approximate posterior of a sticky HDP-HMM (Fox et al., 2008), similar to the analogous experiment by Yurochkin et al. (2019a). KLfusion matches activities inferred from each subject. ... Following Campbell & How (2014), we run decentralized variational inference on the latent Dirichlet allocation topic model (Blei et al., 2003). We verify our method against Approximate Merging of Posteriors with Symmetry (AMPS) algorithm from Campbell & How (2014) on the 20 newsgroups dataset, consisting of 18,689 documents with 1,000 held out for testing and a vocabulary of 12,497 words after stemming and stop word removal.
Dataset Splits No The paper mentions splitting MNIST data into five partitions and using 1,000 documents from the 20 newsgroups dataset for testing, but it does not specify a distinct validation set or the exact percentages of train/validation/test splits across all experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions various models and inference methods (e.g., Bayesian neural networks, Gaussian mixture models, variational inference) and refers to
Experiment Setup Yes For this experiment, we split the MNIST training data into five partitions at random. We simulate a heterogeneous partitioning of the data by sampling the proportion pk of each class k from a five-dimensional symmetric Dirichlet distribution with a concentration parameter of 0.8, and allocating a pk,j proportion of the instances of class k to partition j. ... For each dataset we train a single 150-node hidden layer BNN with a horseshoe prior (Ghosh et al., 2019; 2018). Horseshoe is a shrinkage prior allowing BNNs to automatically select the appropriate number of hidden units. We use Gaussian variational distribution with diagonal covariance for the weights of the neurons. Details are presented in the supplement.