Training independent subnetworks for robust prediction

Authors: Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, Image Net, and their out-of-distribution variants compared to previous methods.
Researcher Affiliation Collaboration Marton Havasi Department of Engineering University of Cambridge mh740@cam.ac.uk Rodolphe Jenatton Google Research rjenatton@google.com Stanislav Fort Stanford University sfort1@stanford.edu Jeremiah Zhe Liu Google Research & Harvard University jereliu@google.com Jasper Snoek Google Research jsnoek@google.com Balaji Lakshminarayanan Google Research balajiln@google.com Andrew M. Dai Google Research adai@google.com Dustin Tran Google Research trandustin@google.com
Pseudocode Yes Algorithm 1 Train(X) ... Algorithm 2 Evaluate(x )
Open Source Code Yes MIMO s code is open-sourced. https://github.com/google/edward2/tree/master/experimental/mimo
Open Datasets Yes We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, Image Net, and their out-of-distribution variants compared to previous methods.
Dataset Splits No The paper explicitly mentions using 'training' and 'test' sets for the synthetic example and refers to 'CIFAR10', 'CIFAR100', and 'Image Net' for main experiments, but it does not specify a distinct validation set split or methodology for its main experiments.
Hardware Specification Yes To measure computational cost, we look at how long it takes to evaluate the model on a TPUv2 core, measured in ms per example.
Software Dependencies No The paper mentions using 'Uncertainty Baselines' framework but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For the Res Net28-10/CIFAR models, we use a batch-size of 512, a decaying learning rate of 0.1 (decay rate 0.1) and L2 regularization 2e-4. The Deterministic, Dropout and Ensemble models are trained for 200 epochs while Batch Ensemble, Naive multihead and Tree Net are trained for 250 epochs. For MIMO, we use the hyperparameters of the baseline implementations wherever possible. For the Res Net28-10/CIFAR models, we use a batch-size of 512 with decaying learning rate of 0.1 (decay rate 0.1), L2 regularization 3e-4, 250 training epochs, and a batch repetition of 4. For the Res Net50/Image Net models, we use a batch-size of 4096 with decaying learning rate of 0.1 (decay rate 0.1), L2 regularization 1e-4, 150 training epochs, and batch repetition of 2.