What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Authors: Md Yousuf Harun, Kyungbok Lee, Gianmarco Gallardo, Giri Krishnan, Christopher Kanan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings... We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability... Using 64 pre-trained ID backbones and 8,604 linear probes, we identify conditions that exacerbate, reduce, and eliminate the tunnel effect.
Researcher Affiliation Academia 1Rochester Institute of Technology 2University of Rochester 3Georgia Tech
Pseudocode No The paper describes methods and procedures in narrative text, but it does not contain any formally structured pseudocode or algorithm blocks.
Open Source Code Yes Project website and code: https://yousuf907.github.io/oodg. We intend to release the dataset of results and our SHAP Slope analysis code on our project website, where the code will have an open-source license.
Open Datasets Yes We used 11 datasets in total in our paper and they are Image Net-1K, Image Net-100, Image Net-R, CIFAR-10, CIFAR-100, NINCO, CUB-200, Aircrafts, Oxford Pets, Flowers-102, and STL-10. All datasets are widely used and publicly available.
Dataset Splits Yes The dataset is split into 50,000 training images and 10,000 test images. (CIFAR-10). Standard test sets are used for all ID datasets.
Hardware Specification Yes We ran all experiments using four NVIDIA A5000 GPUs, including training backbones and linear probes.
Software Dependencies No The paper states 'We implemented our code in Python using Py Torch.' but does not specify version numbers for Python, PyTorch, or any other libraries or frameworks.
Experiment Setup Yes For training VGGm-11/17 on Image Net-100, we employ the Adam W optimizer with an LR of 6 × 10−3 and a WD of 5 × 10−2 for batch size 512. The model is trained for 100 epochs using the Cosine Annealing LR scheduler with a linear warmup of 5 epochs. We use label smoothing of 0.1 with cross-entropy loss.