Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA

Authors: David Smerkous, Qinxun Bai, Fuxin Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both diverse ensembles and hypernetworks show that our approach significantly outperforms baselines in terms of uncertainty quantification in both synthetic and realistic outlier detection tasks. In this section, we conduct experiments on several datasets, ranging from synthetic tasks to realistic out-of-distribution (OOD) detection problems, to validate our approach.
Researcher Affiliation Collaboration David Smerkous Oregon State University Corvallis, OR, USA smerkoud@oregonstate.edu Qinxun Bai Horizon Robotics Sunnyvale, CA, USA qinxun.bai@gmail.com Li Fuxin Oregon State University Corvallis, OR, USA lif@oregonstate.edu
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is publicly available at https://github.com/Deep-Machine-Vision/he-cka-ensembles.
Open Datasets Yes Experiments on synthetic data, MNIST, CIFAR, and Tiny Image Net show that our approach maintains the predictive accuracy of ensemble models while boosting their performance in uncertainty estimation across both synthetic and realistic datasets. We evaluated our proposed approach on a variety of real-world datasets, including Dirty-MNIST, Fashion-MNIST, CIFAR-10/100, SVHN, and Tiny Image Net. (Citations for these datasets are provided in the references section: Lecun et al., 1998; Xiao et al., 2017; Krizhevsky, 2009; Netzer et al., 2011; Le & Yang, 2015; Cimpoi et al., 2014).
Dataset Splits Yes We utilized a training split of 80:10:10 for training, validation, and testing respectively.
Hardware Specification Yes We evaluated mini-batch training time averaged over 50 batches on a Quadro RTX 8000.
Software Dependencies No The paper mentions software components like 'Adam W' and 'SGD' as optimizers but does not provide specific version numbers for any software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes We provide experimental settings and training details here and additionally in Appendix C. Limitations of this approach are discussed in Appendix D, while further insights into memory usage and computational efficiency are discussed in Appendix G. Details regarding synthetic OOD example generation is described in Appendix E.2. Models were trained using Adam W with lr = 0.0065 and weight decay of 0.001 for 50 epochs and trained for 200 epochs using SGD with a learning rate of 0.1 and weight decay 5e-4. The HE-CKA kernel used a linear kernel for feature calculation with the exponential kernel s = 2, and γ = 1.0.