MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

Authors: De-An Huang, Zhiding Yu, Anima Anandkumar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Min VIS on three datasets: You Tube-VIS 2019/2021 [1] and Occluded VIS (OVIS) [12].
Researcher Affiliation Collaboration De-An Huang NVIDIA deahuang@nvidia.com Zhiding Yu NVIDIA zhidingy@nvidia.com Anima Anandkumar Caltech, NVIDIA anima@caltech.edu
Pseudocode No The paper describes the inference pipeline textually and visually in Figure 1, but does not provide structured pseudocode or an algorithm block.
Open Source Code Yes Code is available at: https://github.com/NVlabs/MinVIS
Open Datasets Yes We evaluate Min VIS on three datasets: You Tube-VIS 2019/2021 [1] and Occluded VIS (OVIS) [12].
Dataset Splits Yes You Tube-VIS 2019 contains 2238/302/343 videos for training/validation/testing, while You Tube-VIS 2021 expands the dataset to 2985/421/453 videos for training/validation/testing, and includes higher quality annotations. OVIS has 25 object classes and contains 607/140/154 for training/validation/testing.
Hardware Specification No The paper's 'Responsibility Statement' explicitly states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'. No specific hardware details are provided in the paper.
Software Dependencies No The paper mentions following 'Mask2Former VIS [3]' for hyper-parameters, but does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages.
Experiment Setup Yes Unless otherwise noted, our hyper-parameters follow Mask2Former VIS [3]. All models are pre-trained with COCO instance segmentation [35]. For OVIS, we use the same hyper-parameters as You Tube-VIS 2019 except training for 10k iterations instead of 6k. For training losses, the weights are 5.0 for Lmask and 2.0 for Lcls. All results of Min VIS are averaged over 3 random seeds.