MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Authors: De-An Huang, Zhiding Yu, Anima Anandkumar
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Min VIS on three datasets: You Tube-VIS 2019/2021 [1] and Occluded VIS (OVIS) [12]. |
| Researcher Affiliation | Collaboration | De-An Huang NVIDIA deahuang@nvidia.com Zhiding Yu NVIDIA zhidingy@nvidia.com Anima Anandkumar Caltech, NVIDIA anima@caltech.edu |
| Pseudocode | No | The paper describes the inference pipeline textually and visually in Figure 1, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code is available at: https://github.com/NVlabs/MinVIS |
| Open Datasets | Yes | We evaluate Min VIS on three datasets: You Tube-VIS 2019/2021 [1] and Occluded VIS (OVIS) [12]. |
| Dataset Splits | Yes | You Tube-VIS 2019 contains 2238/302/343 videos for training/validation/testing, while You Tube-VIS 2021 expands the dataset to 2985/421/453 videos for training/validation/testing, and includes higher quality annotations. OVIS has 25 object classes and contains 607/140/154 for training/validation/testing. |
| Hardware Specification | No | The paper's 'Responsibility Statement' explicitly states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'. No specific hardware details are provided in the paper. |
| Software Dependencies | No | The paper mentions following 'Mask2Former VIS [3]' for hyper-parameters, but does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | Unless otherwise noted, our hyper-parameters follow Mask2Former VIS [3]. All models are pre-trained with COCO instance segmentation [35]. For OVIS, we use the same hyper-parameters as You Tube-VIS 2019 except training for 10k iterations instead of 6k. For training losses, the weights are 5.0 for Lmask and 2.0 for Lcls. All results of Min VIS are averaged over 3 random seeds. |