PatchDCT: Patch Refinement for High Quality Instance Segmentation

Authors: Qinrou Wen, Jirui Yang, Xue Yang, Kewei Liang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on COCO show that our method achieves 2.0%, 3.2%, 4.5% AP and 3.4%, 5.3%, 7.0% Boundary AP improvements over Mask-RCNN on COCO, LVIS, and Cityscapes, respectively. It also surpasses DCT-Mask by 0.7%, 1.1%, 1.3% AP and 0.9%, 1.7%, 4.2% Boundary AP on COCO, LVIS and Cityscapes. Besides, the performance of Patch DCT is also competitive with other state-of-the-art methods.
Researcher Affiliation Collaboration Qinrou Wen1, Jirui Yang2, Xue Yang3, Kewei Liang1, 1School of Mathematical Sciences, Zhejiang University 2Alibaba Group 3Department of CSE, Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University
Pseudocode No The paper describes its method and pipeline in text and uses Figure 2 to illustrate the pipeline, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Py Torch Code: https://github.com/olivia-w12/Patch DCT
Open Datasets Yes We evaluate our method on two standard instance segmentation datasets: COCO (Lin et al., 2014) and Cityscapes (Cordts et al., 2016). Following (Kirillov et al., 2020), we also report AP and AP B, which evaluate COCO val2017 with high-quality annotations provided by LVIS (Gupta et al., 2019).
Dataset Splits Yes Cityscapes is a dataset focused on urban street scenes. It contains 8 categories for instance segmentation, providing 2,975, 500 and 1,525 high-resolution images (1, 024 2, 048) for training, validation, and test respectively.
Hardware Specification Yes Runtime is measured on a single A100. ... about 1.5 FPS degradation on the A100 GPU. ... Mask-Transifer runs at 5.5 FPS on the A100 GPU
Software Dependencies No The paper mentions building the model based on DCT-Mask and implementing the algorithm based on Detectron2. It also notes 'Py Torch Code' in the abstract, but it does not specify version numbers for these software components.
Experiment Setup Yes We set the patch size to 8 and each patch is represented by a 6-dimensional DCT vector. Our model is class-specific by default, i.e. one mask per class. L1 loss and cross-entropy loss are used for DCT vector regression and patch classification respectively. By default, only one Patch DCT module is used, and both λ0 and λ1 are set to 1. We implement our algorithm based on Detectron2 (Wu et al., 2019), and all hyperparameters remain the same as Mask-RCNN in Detectron2. Unless otherwise stated, 1 learning schedule is used.