reproducibilityindex.ai

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline Center Net2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP +6.5 mask AP on longtail classes. Our code and models are available at https://github.com/yoctta/XPaste. We perform extensive experiments to validate the superiority of X-Paste
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2Microsoft. Correspondence to: Jianmin Bao <jianmin.bao@microsoft.com>, Wenbo Zhou <welbeckz@ustc.edu.cn>.
Pseudocode	No	The paper describes its methodology in narrative text and block diagrams (Figure 1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and models are available at https://github.com/yoctta/XPaste.
Open Datasets	Yes	Datasets. We conduct experiments of object detection and instance segmentation on LVIS (Gupta et al., 2019) and MS-COCO (Lin et al., 2014) datasets.
Dataset Splits	Yes	LVIS dataset contains 100k training images, and 20k validation images. It has 1203 categories... MS-COCO dataset contains 118K training, 5K validation, and 20K test-dev images. We use the official split for training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or detailed specifications of the machines used for running experiments.
Software Dependencies	No	The paper mentions software frameworks and models like Center Net2, Detectron2, Stable Diffusion V1.4, and CLIP model, but does not provide specific version numbers for all key software dependencies required for reproduction (e.g., Detectron2 version, PyTorch version).
Experiment Setup	Yes	The training configurations are set as follows: training resolution is set to 640, the batch size is 32, and 4 schedule (48 epochs). ... For Stable Diffusion, the diffusion steps is set to 200 with the classifier-free scale set to 5.0. ... For Instance Filtering, we set the CLIP threshold as 0.21 to filter all the obtained instances. ... When the instance occludes an object in the background image, we remove fully occluded objects and update mask and bounding box annotations accordingly. ... the number of instances pasted to each background image is set to 20 for training.