A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors
Authors: Olivier Laurent, Emanuel Aldea, Gianni Franchi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. |
| Researcher Affiliation | Academia | Olivier Laurent,1,2 Emanuel Aldea1 & Gianni Franchi2, SATIE, Paris-Saclay University,1 U2IS, ENSTA Paris, Polytechnic Institute of Paris2 |
| Pseudocode | No | No structured pseudocode or algorithm blocks with explicit labels like "Algorithm" or "Pseudocode" were found in the paper. |
| Open Source Code | Yes | To help replicate our work, we share the source code of our experiments on Git Hub, notably including code to remove symmetries from neural networks a posteriori. |
| Open Datasets | Yes | To ensure transparency and accessibility, we use publicly available datasets, including MNIST, Fashion MNIST, CIFAR100, SVHN, Image Net-200, and Textures. Please refer to Appendix C.2.2 for details on these datasets. |
| Dataset Splits | No | No explicit statement providing specific percentages or sample counts for training, validation, and test splits across all datasets was found. |
| Hardware Specification | Yes | This work was performed using HPC resources from GENCI-IDRIS (Grant 2023[AD011011970R3])." and "training a substantial number of checkpoints for estimating the posterior, especially in the case of the thousand models trained on Tiny Image Net, was energy intensive (around 3 Nvidia V100 hours per training). |
| Software Dependencies | No | No explicit version numbers for software dependencies were provided in the text; for example, it mentions 'Torch Uncertainty' and 'cvxpy (Diamond & Boyd, 2016)' without specific versions. |
| Experiment Setup | Yes | We train Optu Net for 60 epochs with batches of size 64 using stochastic gradient descent (SGD) with a start learning rate of 0.04 and a weight decay of 2 10 4. We decay the learning rate twice during training, at epochs 15 and 30, dividing the learning rate by 2. |