reproducibilityindex.ai

Scalable Diverse Model Selection for Accessible Transfer Learning

Authors: Daniel Bolya, Rohit Mittapalli, Judy Hoffman

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We formalize this setting as Scalable Diverse Model Selection and propose several benchmarks for evaluating on this task. We ﬁnd that existing model selection and transferability estimation methods perform poorly here and analyze why this is the case. We then introduce simple techniques to improve the performance and speed of these algorithms. Finally, we iterate on existing methods to create PARC, which outperforms all other methods on diverse model selection.
Researcher Affiliation	Academia	Daniel Bolya Georgia Tech dbolya@gatech.edu Rohit Mittapalli Georgia Tech rmittapalli3@gatech.edu Judy Hoffman Georgia Tech judy@gatech.edu
Pseudocode	No	The paper describes methods using prose and mathematical formulas, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks/sections, nor does it present structured code-like steps.
Open Source Code	Yes	We have released all benchmarks and evaluation code at https://dbolya.github.io/parc/ in hopes to further the development of this promising area of research.
Open Datasets	Yes	Thus, for this benchmark, we choose 6 well-known classiﬁcation datasets of various difﬁculties that contain related subthemes: Pets: Stanford Dogs [26] and Oxford Pets [35], Birds: CUB200 [55] and NA Birds [54], and Miscellaneous: CIFAR10 [29] and Caltech101 [14]. We also include VOC2007 [13] and Image Net 1k [7] as the 7th and 8th source datasets, but not as targets.
Dataset Splits	No	The paper mentions using a "probe set" of n=500 images for model selection (a form of validation for the selection method), but it does not specify explicit training/validation splits for the final fine-tuning of models using the larger target training data. While it states "employ grid search to ﬁnd optimal hyperparameters", the details of how the data was split for this purpose (e.g., percentages or counts for a validation set) are not provided.
Hardware Specification	Yes	All models are trained on Titan Xp GPUs and all transferability methods are evaluated on the CPU.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other library versions) that would be needed for reproducibility.
Experiment Setup	No	The paper mentions general settings such as images being resized to 224x224 and using SGD for training with grid search for optimal hyperparameters, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) used in the experiments.