Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Authors: Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments demonstrate that Matcher has superior generalization performance across various segmentation tasks, all without the need for training.
Researcher Affiliation Collaboration 1 Zhejiang University, China {yangliu9610,zhumuzhi,liht,haochen.cad,chunhuashen}@zju.edu.cn 2 Beijing Academy of Artificial Intelligence wangxinlong@baai.ac.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is at: https://github.com/aim-uofa/Matcher
Open Datasets Yes For few-shot semantic segmentation, we evaluate the performance of Matcher on COCO-20i (Nguyen & Todorovic, 2019), FSS-1000 (Li et al., 2020), and LVIS-92i. ... Based on PASCAL VOC 2010 (Everingham et al., 2010) and its body part annotations (Chen et al., 2014), we build the PASCAL-Part dataset following (Morabia et al., 2020). ... PACO (Ramanathan et al., 2023) is a newly released dataset... ...DAVIS 2017 val (Pont-Tuset et al., 2017), and DAVIS 2016 val (Perazzi et al., 2016)
Dataset Splits Yes FSS-1000 consists of mask-annotated images from 1,000 classes, with 520, 240, and 240 classes in the training, validation, and test sets, respectively. ... COCO-20i partitions the 80 categories of the MSCOCO dataset (Lin et al., 2014) into four cross-validation folds, each containing 60 training classes and 20 test classes. ... We split these parts into four folds, each with about 76 different object parts.
Hardware Specification No The research was in part supported by the Supercomputing Center of Hangzhou City University, which provided advanced computing resources.
Software Dependencies No The paper mentions software components like DINOv2 and SAM models, but does not provide specific version numbers for programming languages, libraries, or other dependencies.
Experiment Setup Yes We set input image sizes are 518 518 for one-shot semantic segmentation and object part segmentation and 896 504 for video object segmentation. We set the number of clusters to 8. For COCO-20i and LVIS-92i, we sample the instance-level points from the matched points and dense image points to encourage SAM to output more instance masks. We set the filtering thresholds emd and purity to 0.67, 0.02 and set α, β and λ to 1.0, 0.0, and 0.0, respectively.