reproducibilityindex.ai

Fine-Grained Visual Prompting

Authors: Lingfeng Yang, Yueze Wang, Xiang Li, Xinlong Wang, Jian Yang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our Fine-Grained Visual Prompting (FGVP) demonstrates superior performance in zero-shot comprehension of referring expressions on the Ref COCO, Ref COCO+, and Ref COCOg benchmarks. It outperforms prior methods by an average margin of 3.0% to 4.6%, with a maximum improvement of 12.5% on the Ref COCO+ test A subset. The part detection experiments conducted on the PACO dataset further validate the preponderance of FGVP over existing visual prompting techniques. In this section, we first evaluate individual visual prompting performance. Then, we compare FGVP with previous zero-shot methods on the referring expression comprehension and part detection tasks to show our effectiveness.
Researcher Affiliation	Academia	1Nanjing University of Science and Technology 2Beijing Academy of Artificial Intelligence, 3Nankai University {yanglfnjust, csjyang}@njust.edu.cn, {yzwang, wangxinlong}@baai.ac.cn xiang.li.implus@nankai.edu.cn
Pseudocode	No	The paper describes methods in text and uses figures for illustration but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/ylingfeng/FGVP.
Open Datasets	Yes	We conduct the experiments on several visual datasets, i.e., Ref COCO [63], Ref COCO+ [63], Ref COCOg [39], COCO [36], and PACO [44].
Dataset Splits	Yes	Table 2: Ablation study on the zero-shot performance of individual visual prompting in the validation set of COCO, PACO, Ref COCO, Ref COCO+, and Ref COCOg datasets using ground truth annotations (left) and proposals in referring expression comprehension (right), respectively. Table 4: Accuracy of the part detection with Vi T-L on the validation set of each benchmark.
Hardware Specification	Yes	All experiments are conducted on 8 Tesla V100. Experiments are run on Ref COCO with a CLIP pre-trained Vi T-L/14@336px on 8 NVIDIA A100.
Software Dependencies	No	The paper mentions software like CLIP, SAM, Timm, and PyTorch (via a reference) but does not provide specific version numbers for these or other key software components used in the experiments.
Experiment Setup	Yes	Next, we ablate the standard deviation of the Gaussian blur kernel for blur-based prompting [ 4] (Fig. 5), and a value of 100 achieves the best. Notably, we set the grid size to 16 along one side of the image and used an NMS threshold of 0.7 by default.