What Variables Affect Out-of-Distribution Generalization in Pretrained Models?
Authors: Md Yousuf Harun, Kyungbok Lee, Gianmarco Gallardo, Giri Krishnan, Christopher Kanan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings... We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability... Using 64 pre-trained ID backbones and 8,604 linear probes, we identify conditions that exacerbate, reduce, and eliminate the tunnel effect. |
| Researcher Affiliation | Academia | 1Rochester Institute of Technology 2University of Rochester 3Georgia Tech |
| Pseudocode | No | The paper describes methods and procedures in narrative text, but it does not contain any formally structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project website and code: https://yousuf907.github.io/oodg. We intend to release the dataset of results and our SHAP Slope analysis code on our project website, where the code will have an open-source license. |
| Open Datasets | Yes | We used 11 datasets in total in our paper and they are Image Net-1K, Image Net-100, Image Net-R, CIFAR-10, CIFAR-100, NINCO, CUB-200, Aircrafts, Oxford Pets, Flowers-102, and STL-10. All datasets are widely used and publicly available. |
| Dataset Splits | Yes | The dataset is split into 50,000 training images and 10,000 test images. (CIFAR-10). Standard test sets are used for all ID datasets. |
| Hardware Specification | Yes | We ran all experiments using four NVIDIA A5000 GPUs, including training backbones and linear probes. |
| Software Dependencies | No | The paper states 'We implemented our code in Python using Py Torch.' but does not specify version numbers for Python, PyTorch, or any other libraries or frameworks. |
| Experiment Setup | Yes | For training VGGm-11/17 on Image Net-100, we employ the Adam W optimizer with an LR of 6 × 10−3 and a WD of 5 × 10−2 for batch size 512. The model is trained for 100 epochs using the Cosine Annealing LR scheduler with a linear warmup of 5 epochs. We use label smoothing of 0.1 with cross-entropy loss. |