Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation
Authors: Xiang Li, Jinglu Wang, Xiao Li, Yan Lu1429-1437
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have been conducted on popular VIS datasets, i.e. Youtube-VIS-19/21. Our model achieves the best performance among all online VIS methods. We conduct extensive ablation studies on Youtube-VIS-2019 to show the effectiveness of different components of our method. |
| Researcher Affiliation | Collaboration | Xiang Li, 1,2* Jinglu Wang, 2 Xiao Li,2 Yan Lu 2 1 Department of Electrical and Computer Engineering, Carnegie Mellon University 2 Microsoft Research Asia xl6@andrew.cmu.edu, {jinglwa, xili11, yanlu}@microsoft.com |
| Pseudocode | No | No clearly labeled pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on two extensively used VIS datasets Youtube-VIS-2019 and Youtube-VIS-2021. |
| Dataset Splits | Yes | Youtube-VIS-2019 has 40 categories, 4,883 unique video instances, and 131k highquality manual annotations. There are 2,238 training videos, 302 validation videos, and 343 test videos in it. Youtube-VIS-2021 is an improved version of the Youtube-VIS-2019 dataset, which contains 8,171 unique video instances and 232k high-quality manual annotations. There are 2,985 training videos, 421 validation videos, and 453 test videos in this dataset. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or type of computing cluster) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | The paper only mentions the "Tensorflow2 framework" without specifying a version number or listing other software dependencies with their versions. |
| Experiment Setup | Yes | All frames are resized and padded to 641 641 during training and inference. We train our model for 35k iterations with a poly learning rate policy where the learning rate is multiplied by (1 iter itermax )0.9 for each iteration with an initial learning rate of 0.001 to all experiments. The batchsize = 32 and an adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and weight decay = 0 is leveraged. Multi-scale training is adopted to obtain a strong baseline. We select adjacent three frames as reference frames if not specific. |