MobileInst: Video Instance Segmentation on the Mobile
Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on COCO and You Tube-VIS datasets to demonstrate the superiority of Mobile Inst and evaluate the inference latency on one single CPU core of the Snapdragon 778G Mobile Platform, without other methods of acceleration. |
| Researcher Affiliation | Collaboration | Renhong Zhang1*, Tianheng Cheng1 , Shusheng Yang1, Haoyi Jiang1, Shuai Zhang2, Jiancheng Lyu2, Xin Li2, Xiaowen Ying2, Dashan Gao2, Wenyu Liu1, Xinggang Wang1 1 School of EIC, Huazhong University of Science & Technology 2 Qualcomm AI Research, Qualcomm Technologies, Inc |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a link or explicit statement) for the source code of the methodology described. |
| Open Datasets | Yes | COCO. COCO dataset is a touchstone for instance segmentation methods, which has 118k, 5k, and 20k images for training, validation, and testing respectively. Mobile Inst is trained on train2017 and evaluated on val2017 or test-dev2017. You Tube-VIS. You Tube-VIS 2019 is a large-scale dataset for VIS, which has 2,883 videos and 4,883 instances covering 40 categories. You Tube-VIS 2021 expands it to 1.5 videos and 2 instances with improved 40 categories. We evaluate our methods on the validation set of both datasets1. 1ALL Datasets were solely downloaded and evaluated by the University. |
| Dataset Splits | Yes | COCO dataset is a touchstone for instance segmentation methods, which has 118k, 5k, and 20k images for training, validation, and testing respectively. Mobile Inst is trained on train2017 and evaluated on val2017 or test-dev2017. |
| Hardware Specification | Yes | We conduct experiments on COCO and You Tube-VIS datasets to demonstrate the superiority of Mobile Inst and evaluate the inference latency on one single CPU core of the Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, Mobile Inst achieves 31.2 mask AP and 433 ms on the mobile CPU. ...GPU denotes NVIDIA 2080 Ti and Mobile denotes Snapdragon 778G. |
| Software Dependencies | No | The inference speeds of all models are measured using TNN framework2 on the CPU core of Snapdragon 778G without other methods of acceleration. 2TNN: a uniform deep learning inference framework. The paper mentions the 'TNN framework' but does not specify its version number or any other software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Instance Segmentation. We use the Adam W optimizer with an initial learning rate 1 10 4 and set the backbone multiplier to 0.5. Following the training schedule and data augmentation as (Cheng et al. 2022b), all models are trained for 270k iterations with a batch size of 64 and decay the learning rate by 10 at 210k and 250k. We apply random flip and scale jitter to augment the training images. More precisely, the shorter edge varies from 416 to 640 pixels, while the longer edge remains under 864 pixels. Video Instance Segmentation. The models are initialized with weights from the instance segmentation model pretrained on the COCO train2017. We set the learning rate to 5 10 5 and train for 12 epochs with a 10 decay at the 8-th and 11-th epochs. We only employ basic data augmentation, such as resizing the shorter side of the image to 360, without using any additional data or tricks. |