Tackling covariate shift with node-based Bayesian neural networks
Authors: Trung Q Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the method on out-of-distribution image classification benchmarks, and show improved uncertainty estimation of node-based BNNs under covariate shift due to input perturbations. In this section, we present experimental results of node-based BNNs on image classification tasks. For the datasets, we use CIFAR (Krizhevsky et al., 2009) and TINYIMAGENET (Le & Yang, 2015), which have corrupted versions of the test set provided by Hendrycks & Dietterich (2019). We use VGG16 (Simonyan & Zisserman, 2014), RESNET18 (He et al., 2016a) and PREACTRESNET18 (He et al., 2016b) for the architectures. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Aalto University, Finland 2Department of Computer Science, University of Helsinki, Finland 3Department of Computer Science, University of Manchester, UK. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/ Aalto PML/node-BNN-covariate-shift. |
| Open Datasets | Yes | For the datasets, we use CIFAR (Krizhevsky et al., 2009) and TINYIMAGENET (Le & Yang, 2015), which have corrupted versions of the test set provided by Hendrycks & Dietterich (2019). |
| Dataset Splits | No | The paper does not explicitly provide specific details on training/validation/test splits (e.g., percentages, sample counts, or explicit splitting methodology), beyond mentioning the datasets themselves which often come with standard splits. |
| Hardware Specification | Yes | All experiment were performed on one Tesla V100 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python or deep learning frameworks used. |
| Experiment Setup | Yes | F.2. Experimental details and hyperparameters: For all the experiments on CIFAR10/CIFAR100, we run each experiment for 300 epochs, where we increase β from 0 to 1 for the first 200 epochs. We use SGD as our optimizer, and we use a weight decay of 0.0005 for the parameters θ. We use a batch size of 128. For all the experiments on TINYIMAGENET, we run each experiment for 150 epochs, where we increase β from 0 to 1 for the first 100 epochs. We use a batch size of 256. Bellow, we use λ1 and λ2 to denote the learning rate of the parameters θ and ϕ respectively. For VGG16, we set the initial learning rate λ1 = λ2 = 0.05, and we decrease λ1 linearly from 0.05 to 0.0005 from epoch 150 to epoch 270, while keeping λ2 fixed throughout training. We initialize the standard deviations with N +(0.30, 0.02) and set the standard deviation of the prior to 0.30. |