Training Bayesian Neural Networks with Sparse Subspace Variational Inference
Authors: Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that SSVI sets new benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20 compression in model size with under 3% performance drop, and up to 20 FLOPs reduction during training compared with dense VI training. We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) and Res Net18 (He et al., 2016) as the backbone networks. |
| Researcher Affiliation | Academia | Junbo Li1, Zichen Miao2, Qiang Qiu2, Ruqi Zhang1 1. Department of Computer Science 2. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907, USA {ljunbo,miaoz,qqiu,ruqiz}@purdue.edu |
| Pseudocode | Yes | Algorithm 1 Sparse Subspace Variational Inference (SSVI) |
| Open Source Code | Yes | We release the code at https://github.com/ljb121002/SSVI. |
| Open Datasets | Yes | We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) and Res Net18 (He et al., 2016) as the backbone networks. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and CIFAR-100 datasets but does not explicitly provide the training, validation, or test split percentages, sample counts, or an explicit reference to using standard splits for these datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning "Res Net-18 as the backbone networks" which is a model, not hardware. |
| Software Dependencies | No | The paper does not specify version numbers for any software components, programming languages, or libraries used in the experiments (e.g., Python version, PyTorch/TensorFlow version). |
| Experiment Setup | Yes | We train the model for 200 epochs, with a batch size of 128. For the optimizer, we use Adam with a learning rate of 0.001. The initial value for KL warm-up β is set to 0.1, and it increases linearly to 1.0 during the first 50 epochs. The initial standard deviation for the BNN s weights is set to 0.01. For the weight removal rate, we set it to 0.05, and for the weight addition rate, we set it to 0.05. The inner update steps M is set to 5. |