AIMS: All-Inclusive Multi-Level Segmentation for Anything

Authors: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness and generalization capacity of our method compared to other state-of-the-art methods on a single dataset or the concurrent work on segment anything.
Researcher Affiliation Collaboration Lu Qi1 Jason Kuen2 Weidong Guo3 Jiuxiang Gu2 Zhe Lin2 Bo Du4 Yu Xu3 Ming-Hsuan Yang1,5 1UC Merced 2Adobe Research 3QQ Brower Lab, Tencent 4 Wuhan University 5 Google Research
Pseudocode No The paper describes methods and formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We will make our code and training model publicly available.
Open Datasets Yes We train our AIMS model on existing segmentation datasets such as Pascal Panoptic Parts [6], COCOPSG [8], PACO [9], and Entity Seg [5]. We construct our training set by aggregating images from five segmentation datasets, including COCO [53], Entity Seg [5], Pascal VOC Part (PPP) [6], PACO [9], and COCO-PSG [8].
Dataset Splits Yes Initially, we select 1069 and 1000 validation images from PPP [6] (which covers the part and entity levels) and COCO-PSG [8] (which covers the entity and relation levels) respectively. Following this, we eliminate any duplicate images in the unified training set that are present in the validation images, resulting in a refined training set comprised of approximately 236.7K unique images.
Hardware Specification Yes During each training iteration, we sample the data and tasks as introduced in the sampling strategy of Section 3.4, with a batch size of 64 on 8 A100 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow, specific deep learning frameworks).
Experiment Setup Yes We train our model for 36,000 iterations using a base learning rate of 0.0001 and weights pre-trained on COCO-Entity [3] with the exception of images contained in our validation set. The longer edge size of the images is set to 1,333 pixels, while the shorter edge size is randomly sampled between 640 and 800 pixels, with a stride of 32 pixels. The learning rate is decayed by a factor of 0.1 after 28,000 and 33,000 iterations, respectively. During each training iteration, we sample the data and tasks as introduced in the sampling strategy of Section 3.4, with a batch size of 64 on 8 A100 GPUs.