A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors

Authors: Olivier Laurent, Emanuel Aldea, Gianni Franchi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures.
Researcher Affiliation Academia Olivier Laurent,1,2 Emanuel Aldea1 & Gianni Franchi2, SATIE, Paris-Saclay University,1 U2IS, ENSTA Paris, Polytechnic Institute of Paris2
Pseudocode No No structured pseudocode or algorithm blocks with explicit labels like "Algorithm" or "Pseudocode" were found in the paper.
Open Source Code Yes To help replicate our work, we share the source code of our experiments on Git Hub, notably including code to remove symmetries from neural networks a posteriori.
Open Datasets Yes To ensure transparency and accessibility, we use publicly available datasets, including MNIST, Fashion MNIST, CIFAR100, SVHN, Image Net-200, and Textures. Please refer to Appendix C.2.2 for details on these datasets.
Dataset Splits No No explicit statement providing specific percentages or sample counts for training, validation, and test splits across all datasets was found.
Hardware Specification Yes This work was performed using HPC resources from GENCI-IDRIS (Grant 2023[AD011011970R3])." and "training a substantial number of checkpoints for estimating the posterior, especially in the case of the thousand models trained on Tiny Image Net, was energy intensive (around 3 Nvidia V100 hours per training).
Software Dependencies No No explicit version numbers for software dependencies were provided in the text; for example, it mentions 'Torch Uncertainty' and 'cvxpy (Diamond & Boyd, 2016)' without specific versions.
Experiment Setup Yes We train Optu Net for 60 epochs with batches of size 64 using stochastic gradient descent (SGD) with a start learning rate of 0.04 and a weight decay of 2 10 4. We decay the learning rate twice during training, at epochs 15 and 30, dividing the learning rate by 2.