Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors
Authors: Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann LeCun, Andrew G. Wilson
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now conduct a thorough empirical evaluation of our transfer learning pipeline on image classification and semantic segmentation. We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets. |
| Researcher Affiliation | Collaboration | Ravid Shwartz-Ziv New York University ravid.shwartz.ziv@nyu.edu Micah Goldblum New York University goldblum@nyu.edu Hossein Souri Johns Hopkins University Sanyam Kapoor New York University Chen Zhu University of Maryland Yann Le Cun New York University Meta AI Research Andrew Gordon Wilson New York University andrewgw@cims.nyu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release Py Torch pre-trained priors and code for learning and using priors for downstream inference: https://github.com/hsouri/Bayesian Transfer Learning. |
| Open Datasets | Yes | We use a Sim CLR (SSL) Res Net-50 checkpoint [6] pre-trained on the Image Net 1k dataset [9] and fit our prior to the Sim CLR loss function. ... We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets. |
| Dataset Splits | Yes | We select the highest performing scalar value across a grid on a holdout set from the downstream task. If we do not scale the covariance enough, our prior will become concentrated around parameters which are inconsistent with the downstream task... We perform image classification experiments on four downstream tasks: CIFAR-10, CIFAR-100 [30], Oxford Flowers-102 [36], and Oxford-IIIT Pets [38]. On semantic segmentation, we use a Deep Labv3+ system [4] with Res Net-50 and Res Net-101 backbone architectures, and we evaluate performance on the Pascal VOC 2012 [14] and Cityscapes [7] datasets. |
| Hardware Specification | Yes | For instance, learning an ImageNet-scale prior using SWAG with a ResNet-50 backbone takes approximately 10 hours on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We adopt the Res Net-50 and Res Net-101 architectures [18], and we scale the input image to 224 224 pixels to accommodate these feature extractors designed for Image Net data. We use a Sim CLR (SSL) Res Net-50 checkpoint [6] pre-trained on the Image Net 1k dataset [9] and fit our prior to the Sim CLR loss function. ... We provide a detailed description of hyperparameters in Appendix C.1. |