AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Authors: Chaofan Ma, Yang Yuhuan, Chen Ju, Fei Zhang, Ya Zhang, Yanfeng Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness, we annotate three types of datasets with attribute descriptions, and conduct extensive experiments and ablation studies.
Researcher Affiliation Academia Chaofan Ma1, Yuhuan Yang1, Chen Ju1, Fei Zhang1, Ya Zhang1,2, Yanfeng Wang1,2B 1 Coop. Medianet Innovation Center, Shanghai Jiao Tong University 2 Shanghai AI Laboratory {chaofanma, yangyuhuan, ju_chen, ferenas, ya_zhang, wangyanfeng622}@sjtu.edu.cn
Pseudocode No The paper describes the method using figures and textual descriptions but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets Yes To evaluate the significance of attribute understanding for OVSS, we annotate attribute descriptions on three types of datasets, namely, PASCAL series [13, 16, 35], COCO series [28, 8], and Fantastic Beasts. PASCAL-5i contains 20 categories that are divided into 4 folds of 5 classes each, i.e., {5i}3 i=0. COCO-20i is more challenging with 80 categories that are also divided into 4 folds, i.e., {20i}3 i=0, with each fold having 20 categories.
Dataset Splits Yes PASCAL-5i contains 20 categories that are divided into 4 folds of 5 classes each, i.e., {5i}3 i=0. COCO-20i is more challenging with 80 categories that are also divided into 4 folds, i.e., {20i}3 i=0, with each fold having 20 categories. Of the four folds in the two datasets, one is used for evaluation, while the other three are used for training. We evaluate on the 1.5k validation images with 20 categories (PAS-20).
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions software components like CLIP and Adam W optimizer but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We adopt CLIP Vi T-L and Res Net101 as our backbone, and choose aggregation stages L = 4. Numbers of learnable cluster in each stage are (15, 10, 5, 1). During training, the sampled attributes N = 15. Adam W optimizer is used with Cosine LRScheduler by first warm up 10 epochs from initial learning rate 4e-6 to 1e-3, and the weight decay is set to 0.05.