Training Bayesian Neural Networks with Sparse Subspace Variational Inference

Authors: Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show that SSVI sets new benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20 compression in model size with under 3% performance drop, and up to 20 FLOPs reduction during training compared with dense VI training. We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) and Res Net18 (He et al., 2016) as the backbone networks.
Researcher Affiliation Academia Junbo Li1, Zichen Miao2, Qiang Qiu2, Ruqi Zhang1 1. Department of Computer Science 2. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907, USA {ljunbo,miaoz,qqiu,ruqiz}@purdue.edu
Pseudocode Yes Algorithm 1 Sparse Subspace Variational Inference (SSVI)
Open Source Code Yes We release the code at https://github.com/ljb121002/SSVI.
Open Datasets Yes We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) and Res Net18 (He et al., 2016) as the backbone networks.
Dataset Splits No The paper mentions using CIFAR-10 and CIFAR-100 datasets but does not explicitly provide the training, validation, or test split percentages, sample counts, or an explicit reference to using standard splits for these datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning "Res Net-18 as the backbone networks" which is a model, not hardware.
Software Dependencies No The paper does not specify version numbers for any software components, programming languages, or libraries used in the experiments (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes We train the model for 200 epochs, with a batch size of 128. For the optimizer, we use Adam with a learning rate of 0.001. The initial value for KL warm-up β is set to 0.1, and it increases linearly to 1.0 during the first 50 epochs. The initial standard deviation for the BNN s weights is set to 0.01. For the weight removal rate, we set it to 0.05, and for the weight addition rate, we set it to 0.05. The inner update steps M is set to 5.