reproducibilityindex.ai

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

Authors: Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9]. They are highly challenging due to the complex taxonomy. COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes. We investigate different paradigms to leverage synthetic images, including (1) jointly training on real and synthetic images, (2) pre-training on synthetic ones and then fine-tuning with real ones. We observe remarkable gains (e.g., 48.7 52.0) under both paradigms.
Researcher Affiliation	Collaboration	Lihe Yang1 Xiaogang Xu2,3 Bingyi Kang4 Yinghuan Shi5 Hengshuang Zhao1 1The University of Hong Kong 2Zhejiang Lab 3Zhejiang University 4Byte Dance 5Nanjing University
Pseudocode	No	The paper describes algorithmic steps in text and provides mathematical formulas, but it does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code	Yes	https://github.com/Lihe Young/Free Mask
Open Datasets	Yes	We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9].
Dataset Splits	Yes	COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes.
Hardware Specification	Yes	We use 8 Nvidia Tesla V100 GPUs for our training experiments. For example, it takes around 5.8 seconds to synthesize a single image with a V100 GPU. In practice, we speed up the synthesis process with 24 V100 GPUs.
Software Dependencies	No	The paper mentions using "MMSegmentation codebase" but does not specify its version or the versions of other core software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	In pre-training, we exactly follow the hyper-parameters of regular training. In joint training, we over-sample real images to the same number of synthetic images. The learning rate and batch size are the same as the regular training paradigm. Due to the actually halved batch size of real images in each iteration, we double the training iterations to iterate over real training images for the same epochs as regular training. Other hyper-parameters are detailed in the appendix.