AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
Authors: Chaofan Ma, Yang Yuhuan, Chen Ju, Fei Zhang, Ya Zhang, Yanfeng Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness, we annotate three types of datasets with attribute descriptions, and conduct extensive experiments and ablation studies. |
| Researcher Affiliation | Academia | Chaofan Ma1, Yuhuan Yang1, Chen Ju1, Fei Zhang1, Ya Zhang1,2, Yanfeng Wang1,2B 1 Coop. Medianet Innovation Center, Shanghai Jiao Tong University 2 Shanghai AI Laboratory {chaofanma, yangyuhuan, ju_chen, ferenas, ya_zhang, wangyanfeng622}@sjtu.edu.cn |
| Pseudocode | No | The paper describes the method using figures and textual descriptions but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | To evaluate the significance of attribute understanding for OVSS, we annotate attribute descriptions on three types of datasets, namely, PASCAL series [13, 16, 35], COCO series [28, 8], and Fantastic Beasts. PASCAL-5i contains 20 categories that are divided into 4 folds of 5 classes each, i.e., {5i}3 i=0. COCO-20i is more challenging with 80 categories that are also divided into 4 folds, i.e., {20i}3 i=0, with each fold having 20 categories. |
| Dataset Splits | Yes | PASCAL-5i contains 20 categories that are divided into 4 folds of 5 classes each, i.e., {5i}3 i=0. COCO-20i is more challenging with 80 categories that are also divided into 4 folds, i.e., {20i}3 i=0, with each fold having 20 categories. Of the four folds in the two datasets, one is used for evaluation, while the other three are used for training. We evaluate on the 1.5k validation images with 20 categories (PAS-20). |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like CLIP and Adam W optimizer but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We adopt CLIP Vi T-L and Res Net101 as our backbone, and choose aggregation stages L = 4. Numbers of learnable cluster in each stage are (15, 10, 5, 1). During training, the sampled attributes N = 15. Adam W optimizer is used with Cosine LRScheduler by first warm up 10 epochs from initial learning rate 4e-6 to 1e-3, and the weight decay is set to 0.05. |