reproducibilityindex.ai

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Authors: Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alexander J. Smola, Xu Sun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, e.g., 67.0% average accuracy on 10 classiﬁcation datasets (+3.1% compared to Co Op) and 84.4 h Io U on openvocabulary Pascal VOC segmentation (+6.9 compared to ZSSeg). Experimental results in Figure 1 show that POMP outperforms previous state-of-the-art (SOTA) models on a broad range of visual recognition tasks and datasets.
Researcher Affiliation	Collaboration	Shuhuai Ren , Aston Zhang , Yi Zhu , Shuai Zhang , Shuai Zheng , Mu Li , Alex Smola , Xu Sun National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University Amazon Web Services
Pseudocode	No	The paper does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/amazon-science/prompt-pretraining.
Open Datasets	Yes	We conduct prompt pre-training on the Image Net-21K dataset (oﬃcial winter 2021 released version2). We also evaluate on CIFAR10, FGVC Aircraft, Stanford Cars, SUN397, Image Net-1k, Oxford-Pets, Oxford Flowers102, Food-101, Euro SAT, DTD, UCF-101, COCO Stuﬀ, Pascal VOC, ADE20K, PASCAL Context, LVIS, COCO, and Object365.
Dataset Splits	Yes	We follow the processing methods in [47], which involves cleaning invalid classes, allocating 50 images per class for a validation split, and crop-resizing all the images to 224 resolution.
Hardware Specification	Yes	We conduct all the experiments on 8 Nvidia V100 GPUs.
Software Dependencies	No	The paper mentions software like CLIP, Mask Former, Center Net2, but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	The number of training samples for each class is 16 (16 shots), and the prompt length is 16. We sample 1,000 classes at each training step, i.e., K = 1000 in (4). We use the SGD optimizer with an initial learning rate of 0.002, decayed by the cosine annealing rule. The batch size is 32, and the maximum epoch is 20.