Model Fusion with Kullback-Leibler Divergence
Authors: Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh, Justin Solomon
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the value of a Bayesian approach to model fusion through applications from topic models to neural networks. 5. Experimental Results |
| Researcher Affiliation | Collaboration | 1CSAIL, MIT, Cambridge, Massachusetts, USA 2MIT-IBM Watson AI Laboratory, Cambridge, Massachusetts, USA 3IBM Research, Cambridge, Massachusetts, USA. |
| Pseudocode | No | The paper describes the algorithm steps in prose but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | 1Code link: https://github.com/IBM/KL-fusion |
| Open Datasets | Yes | We measure the effectiveness of our procedure for fusing BNNs locally trained on MNIST digits. ... We use mean-field inference with Gaussian Wishart variational distributions to obtain the approximate posterior of a sticky HDP-HMM (Fox et al., 2008), similar to the analogous experiment by Yurochkin et al. (2019a). KLfusion matches activities inferred from each subject. ... Following Campbell & How (2014), we run decentralized variational inference on the latent Dirichlet allocation topic model (Blei et al., 2003). We verify our method against Approximate Merging of Posteriors with Symmetry (AMPS) algorithm from Campbell & How (2014) on the 20 newsgroups dataset, consisting of 18,689 documents with 1,000 held out for testing and a vocabulary of 12,497 words after stemming and stop word removal. |
| Dataset Splits | No | The paper mentions splitting MNIST data into five partitions and using 1,000 documents from the 20 newsgroups dataset for testing, but it does not specify a distinct validation set or the exact percentages of train/validation/test splits across all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions various models and inference methods (e.g., Bayesian neural networks, Gaussian mixture models, variational inference) and refers to |
| Experiment Setup | Yes | For this experiment, we split the MNIST training data into five partitions at random. We simulate a heterogeneous partitioning of the data by sampling the proportion pk of each class k from a five-dimensional symmetric Dirichlet distribution with a concentration parameter of 0.8, and allocating a pk,j proportion of the instances of class k to partition j. ... For each dataset we train a single 150-node hidden layer BNN with a horseshoe prior (Ghosh et al., 2019; 2018). Horseshoe is a shrinkage prior allowing BNNs to automatically select the appropriate number of hidden units. We use Gaussian variational distribution with diagonal covariance for the weights of the neurons. Details are presented in the supplement. |