Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

Authors: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the popular zero-shot benchmarks. With MAFT, the performance of the state-of-the-art methods is promoted by a large margin: 50.4% (+ 8.2%) on COCO, 81.8% (+ 3.2%) on Pascal-VOC, and 8.7% (+4.3%) on ADE20K in terms of m Io U for unseen classes.
Researcher Affiliation Collaboration Siyu Jiao1,2,3 , Yunchao Wei1,2,3, Yaowei Wang3, Yao Zhao1,2,3, Humphrey Shi 4,5 1 Institute of Information Science, Beijing Jiaotong University 2 Peng Cheng Laboratory 3 Beijing Key Laboratory of Advanced Information Science and Network 4 Georgia Institute of Technology 5 Picsart AI Research jiaosiyu99@bjtu.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at github.com/jiaosiyu1999/MAFT.git.
Open Datasets Yes We evaluate our MAFT on three commonly used zero-shot segmentation benchmarks: COCOStuff [2], Pascal-VOC [7], and ADE20K [40].
Dataset Splits Yes ADE20K contains 25k images for training and 2k images for validation. For the zero-shot setting, we follow [6] to choose 847 classes present in both training and validation sets, and split them into 572 seen and 275 unseen classes.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running experiments are provided.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as programming languages or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes We take the batch size of 16 and set CLIP input image size to 480 480. The optimizer is Adam W with a learning rate of 0.00001 and weight decay of 0.00001. The number of training iterations is set to 100 for Pascal-VOC, 1000 for COCO-Stuff and 5000 for ADE20K.