Input-gradient space particle inference for neural network ensembles
Authors: Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on image classification datasets and transfer learning tasks show that Fo RDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Aalto University, Finland 2Department of Computer Science, University of Helsinki, Finland 3Department of Computer Science, University of Manchester, United Kingdom |
| Pseudocode | Yes | We describe the training algorithm of Fo RDE in Algorithm 1. Algorithm 1 Fo RDE |
| Open Source Code | Yes | Our code is available at https://github.com/Aalto PML/ Fo RDE. We have included our codes in the supplementary material and we provide instructions on how to run our experiments in a README.md available in the provided codebase. |
| Open Datasets | Yes | We report performance of Fo RDE against other methods on CIFAR-10/100 (Krizhevsky, 2009) and TINYIMAGENET (Le & Yang, 2015)... All datasets used for our experiments are publicly available. |
| Dataset Splits | No | The paper mentions using "CIFAR-10 and CIFAR-100" and "TINYIMAGENET" for training and evaluation. It also states "For evaluations on input perturbations, we use CIFAR-10/100-C and TINYIMAGENET-C". However, it does not explicitly provide specific percentages or counts for training, validation, and test splits, nor does it specify a validation set directly. |
| Hardware Specification | Yes | in RESNET18/CIFAR-100 experiments of Section 5.2 with an ensemble size of 10, a DE took 31 seconds per epoch on an Nvidia A100 GPU, while Fo RDE took 101 seconds per epoch. |
| Software Dependencies | No | The paper mentions "JAX (Bradbury et al., 2018) or Pytorch (Paszke et al., 2019)" as automatic differentiation libraries used, but does not provide specific version numbers for these or any other software dependencies needed for reproduction. |
| Experiment Setup | Yes | For all the experiments, we used SGD with Nesterov momentum as our optimizer, and we set the momemtum coefficient to 0.9. We used a weight decay λ of 5 10-4 and we set the learning rate η to 10-1. We used a batch size of 128 and we set ϵ in Algorithm 1 to 10-12. We used 15 bins to calculate ECE during evaluation. We ran each experiments for 300 epochs. We decreased the learning rate η linearly from 10-1 to 10-3 from epoch 150 to epoch 270. |