Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

Authors: Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann LeCun, Andrew G. Wilson

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now conduct a thorough empirical evaluation of our transfer learning pipeline on image classification and semantic segmentation. We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets.
Researcher Affiliation Collaboration Ravid Shwartz-Ziv New York University ravid.shwartz.ziv@nyu.edu Micah Goldblum New York University goldblum@nyu.edu Hossein Souri Johns Hopkins University Sanyam Kapoor New York University Chen Zhu University of Maryland Yann Le Cun New York University Meta AI Research Andrew Gordon Wilson New York University andrewgw@cims.nyu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We release Py Torch pre-trained priors and code for learning and using priors for downstream inference: https://github.com/hsouri/Bayesian Transfer Learning.
Open Datasets Yes We use a Sim CLR (SSL) Res Net-50 checkpoint [6] pre-trained on the Image Net 1k dataset [9] and fit our prior to the Sim CLR loss function. ... We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets.
Dataset Splits Yes We select the highest performing scalar value across a grid on a holdout set from the downstream task. If we do not scale the covariance enough, our prior will become concentrated around parameters which are inconsistent with the downstream task... We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets.
Hardware Specification Yes For instance, learning an ImageNet-scale prior using SWAG with a ResNet-50 backbone takes approximately 10 hours on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions 'Py Torch' but does not specify version numbers for any software dependencies.
Experiment Setup Yes We adopt the Res Net-50 and Res Net-101 architectures [18], and we scale the input image to 224 224 pixels to accommodate these feature extractors designed for Image Net data. We use a Sim CLR (SSL) Res Net-50 checkpoint [6] pre-trained on the Image Net 1k dataset [9] and fit our prior to the Sim CLR loss function. ... We provide a detailed description of hyperparameters in Appendix C.1.