Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Authors: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the popular zero-shot benchmarks. With MAFT, the performance of the state-of-the-art methods is promoted by a large margin: 50.4% (+ 8.2%) on COCO, 81.8% (+ 3.2%) on Pascal-VOC, and 8.7% (+4.3%) on ADE20K in terms of m Io U for unseen classes. |
| Researcher Affiliation | Collaboration | Siyu Jiao1,2,3 , Yunchao Wei1,2,3, Yaowei Wang3, Yao Zhao1,2,3, Humphrey Shi 4,5 1 Institute of Information Science, Beijing Jiaotong University 2 Peng Cheng Laboratory 3 Beijing Key Laboratory of Advanced Information Science and Network 4 Georgia Institute of Technology 5 Picsart AI Research jiaosiyu99@bjtu.edu.cn |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at github.com/jiaosiyu1999/MAFT.git. |
| Open Datasets | Yes | We evaluate our MAFT on three commonly used zero-shot segmentation benchmarks: COCOStuff [2], Pascal-VOC [7], and ADE20K [40]. |
| Dataset Splits | Yes | ADE20K contains 25k images for training and 2k images for validation. For the zero-shot setting, we follow [6] to choose 847 classes present in both training and validation sets, and split them into 572 seen and 275 unseen classes. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running experiments are provided. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages or libraries (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | We take the batch size of 16 and set CLIP input image size to 480 480. The optimizer is Adam W with a learning rate of 0.00001 and weight decay of 0.00001. The number of training iterations is set to 100 for Pascal-VOC, 1000 for COCO-Stuff and 5000 for ADE20K. |