Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts

Authors: Zhiwei Lin, Yongtao Wang, Zhi Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the long-tail instance segmentation dataset (LVIS) show that our method surpasses the previous open-ended method on the object detection task and can provide additional instance segmentation masks. Besides, VL-SAM achieves favorable performance on the corner case object detection dataset (CODA), demonstrating the effectiveness of VL-SAM in real-world applications.
Researcher Affiliation Academia Zhiwei Lin Yongtao Wang Zhi Tang Wangxuan Institute of Computer Technology, Peking University, China
Pseudocode No The paper describes the proposed framework and its components using text and figures, but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No We do not provide new datasets and will release the demo after the paper is accepted.
Open Datasets Yes We evaluate VL-SAM on the LVIS dataset [14], which has a long tail of categories and annotations for over 1000 object categories. ... To further demonstrate the effectiveness of the proposed method in the real-world application, we present the results of VL-SAM on corner case object detection dataset CODA for autonomous driving in Table 2.
Dataset Splits No The paper mentions evaluating on "LVIS minival" which is a validation split. However, it does not explicitly provide the overall train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction, or clearly define how these splits were partitioned for its own specific setup beyond citing the evaluation datasets.
Hardware Specification Yes All models are inferred on an 80G A800 machine.
Software Dependencies No The paper mentions specific vision-language and segmentation models used (e.g., "Cog VLM-17B," "Vicuna-7B-v1.5," "SAM with ViT-Huge"), but it does not specify foundational software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their explicit version numbers.
Experiment Setup Yes We set the temperature to 0.8 and top-p for nucleus sampling to 0.1 for Cog VLM-17B.