Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
Authors: Zhiwei Lin, Yongtao Wang, Zhi Tang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the long-tail instance segmentation dataset (LVIS) show that our method surpasses the previous open-ended method on the object detection task and can provide additional instance segmentation masks. Besides, VL-SAM achieves favorable performance on the corner case object detection dataset (CODA), demonstrating the effectiveness of VL-SAM in real-world applications. |
| Researcher Affiliation | Academia | Zhiwei Lin Yongtao Wang Zhi Tang Wangxuan Institute of Computer Technology, Peking University, China |
| Pseudocode | No | The paper describes the proposed framework and its components using text and figures, but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | We do not provide new datasets and will release the demo after the paper is accepted. |
| Open Datasets | Yes | We evaluate VL-SAM on the LVIS dataset [14], which has a long tail of categories and annotations for over 1000 object categories. ... To further demonstrate the effectiveness of the proposed method in the real-world application, we present the results of VL-SAM on corner case object detection dataset CODA for autonomous driving in Table 2. |
| Dataset Splits | No | The paper mentions evaluating on "LVIS minival" which is a validation split. However, it does not explicitly provide the overall train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction, or clearly define how these splits were partitioned for its own specific setup beyond citing the evaluation datasets. |
| Hardware Specification | Yes | All models are inferred on an 80G A800 machine. |
| Software Dependencies | No | The paper mentions specific vision-language and segmentation models used (e.g., "Cog VLM-17B," "Vicuna-7B-v1.5," "SAM with ViT-Huge"), but it does not specify foundational software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their explicit version numbers. |
| Experiment Setup | Yes | We set the temperature to 0.8 and top-p for nucleus sampling to 0.1 for Cog VLM-17B. |