Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective

Authors: Qishuai Wen, Chun-Guang Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on dataset ADE20K find that DEPICT consistently outperforms its black-box counterpart, Segmenter, and it is light weight and more robust.
Researcher Affiliation Academia Qishuai Wen and Chun-Guang Li School of Artificial Intelligence Beijing University of Posts and Telecommunications, Beijing 100876, P.R. China {wqs,lichunguang}@bupt.edu.cn
Pseudocode No The paper describes operations using mathematical equations and textual explanations, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code and models are available at https://github.com/Qishuai Wen/DEPICT.
Open Datasets Yes we conduct extensive experiments on datasets ADE20K [47], Cityscapes [10], and Pascal Context datasets [27]
Dataset Splits Yes Experiments conducted on dataset ADE20K find that DEPICT consistently outperforms its black-box counterpart, Segmenter, and it is light weight and more robust. On ADE20K, we use three MSSA layers for DEPICT-SA and three MSSA layers followed by three MSCA layers for DEPICT-CA on most variants, whereas the exceptions and more settings on the other two datasets can be found in Appendix B. Table 1: Comparison on ADE20K validation set.
Hardware Specification No The NeurIPS Paper Checklist in the document states under question 8, 'For each experiment, does the paper provide sufficient information on the computer resources...': 'No'. Justification: 'The effectiveness of DEPICT is robust with varying computer resources and we have controlled variables for fair comparisons. Therefore, we choose not to report the computer resources.'
Software Dependencies No Appendix B describes variant settings and implementation details, but it does not specify concrete version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or their underlying libraries (e.g., CUDA).
Experiment Setup Yes For fair comparisons to Segmenter, we use the same settings, including backbones, data augmentation, optimization and inference. We humbly refer readers to [35] for more details. On ADE20K, we use three MSSA layers for DEPICT-SA and three MSSA layers followed by three MSCA layers for DEPICT-CA on most variants, whereas the exceptions and more settings on the other two datasets can be found in Appendix B. For DEPICT-SA on the ADE20K dataset, we used three MSSA layers and set #heads dim_head to 3x100 across all variants, with the exception of six MSSA layers for the ViT-L variant. For DEPICT-CA, we used three MSSA layers followed by three MSCA layers, and set #heads dim_head to 3x100 in MSSA and #heads dim_head to 3x50 in MSCA, across all variants, with the exception of setting 3x50 in MSSA for the ViT-S variant and six MSSA layers for the ViT-L variant.