FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Authors: Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9]. They are highly challenging due to the complex taxonomy. COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes. We investigate different paradigms to leverage synthetic images, including (1) jointly training on real and synthetic images, (2) pre-training on synthetic ones and then fine-tuning with real ones. We observe remarkable gains (e.g., 48.7 52.0) under both paradigms. |
| Researcher Affiliation | Collaboration | Lihe Yang1 Xiaogang Xu2,3 Bingyi Kang4 Yinghuan Shi5 Hengshuang Zhao1 1The University of Hong Kong 2Zhejiang Lab 3Zhejiang University 4Byte Dance 5Nanjing University |
| Pseudocode | No | The paper describes algorithmic steps in text and provides mathematical formulas, but it does not include a clearly labeled pseudocode block or algorithm figure. |
| Open Source Code | Yes | https://github.com/Lihe Young/Free Mask |
| Open Datasets | Yes | We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9]. |
| Dataset Splits | Yes | COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes. |
| Hardware Specification | Yes | We use 8 Nvidia Tesla V100 GPUs for our training experiments. For example, it takes around 5.8 seconds to synthesize a single image with a V100 GPU. In practice, we speed up the synthesis process with 24 V100 GPUs. |
| Software Dependencies | No | The paper mentions using "MMSegmentation codebase" but does not specify its version or the versions of other core software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In pre-training, we exactly follow the hyper-parameters of regular training. In joint training, we over-sample real images to the same number of synthetic images. The learning rate and batch size are the same as the regular training paradigm. Due to the actually halved batch size of real images in each iteration, we double the training iterations to iterate over real training images for the same epochs as regular training. Other hyper-parameters are detailed in the appendix. |