reproducibilityindex.ai

What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Authors: Md Yousuf Harun, Kyungbok Lee, Gianmarco Gallardo, Giri Krishnan, Christopher Kanan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings... We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability... Using 64 pre-trained ID backbones and 8,604 linear probes, we identify conditions that exacerbate, reduce, and eliminate the tunnel effect.
Researcher Affiliation	Academia	1Rochester Institute of Technology 2University of Rochester 3Georgia Tech
Pseudocode	No	The paper describes methods and procedures in narrative text, but it does not contain any formally structured pseudocode or algorithm blocks.
Open Source Code	Yes	Project website and code: https://yousuf907.github.io/oodg. We intend to release the dataset of results and our SHAP Slope analysis code on our project website, where the code will have an open-source license.
Open Datasets	Yes	We used 11 datasets in total in our paper and they are Image Net-1K, Image Net-100, Image Net-R, CIFAR-10, CIFAR-100, NINCO, CUB-200, Aircrafts, Oxford Pets, Flowers-102, and STL-10. All datasets are widely used and publicly available.
Dataset Splits	Yes	The dataset is split into 50,000 training images and 10,000 test images. (CIFAR-10). Standard test sets are used for all ID datasets.
Hardware Specification	Yes	We ran all experiments using four NVIDIA A5000 GPUs, including training backbones and linear probes.
Software Dependencies	No	The paper states 'We implemented our code in Python using Py Torch.' but does not specify version numbers for Python, PyTorch, or any other libraries or frameworks.
Experiment Setup	Yes	For training VGGm-11/17 on Image Net-100, we employ the Adam W optimizer with an LR of 6 × 10−3 and a WD of 5 × 10−2 for batch size 512. The model is trained for 100 epochs using the Cosine Annealing LR scheduler with a linear warmup of 5 epochs. We use label smoothing of 0.1 with cross-entropy loss.