reproducibilityindex.ai

Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation

Authors: Xiang Li, Jinglu Wang, Xiao Li, Yan Lu1429-1437

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have been conducted on popular VIS datasets, i.e. Youtube-VIS-19/21. Our model achieves the best performance among all online VIS methods. We conduct extensive ablation studies on Youtube-VIS-2019 to show the effectiveness of different components of our method.
Researcher Affiliation	Collaboration	Xiang Li, 1,2* Jinglu Wang, 2 Xiao Li,2 Yan Lu 2 1 Department of Electrical and Computer Engineering, Carnegie Mellon University 2 Microsoft Research Asia xl6@andrew.cmu.edu, {jinglwa, xili11, yanlu}@microsoft.com
Pseudocode	No	No clearly labeled pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not contain an explicit statement about releasing the code for the described methodology or a link to a code repository.
Open Datasets	Yes	We evaluate our method on two extensively used VIS datasets Youtube-VIS-2019 and Youtube-VIS-2021.
Dataset Splits	Yes	Youtube-VIS-2019 has 40 categories, 4,883 unique video instances, and 131k highquality manual annotations. There are 2,238 training videos, 302 validation videos, and 343 test videos in it. Youtube-VIS-2021 is an improved version of the Youtube-VIS-2019 dataset, which contains 8,171 unique video instances and 232k high-quality manual annotations. There are 2,985 training videos, 421 validation videos, and 453 test videos in this dataset.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, memory, or type of computing cluster) used for running the experiments were mentioned in the paper.
Software Dependencies	No	The paper only mentions the "Tensorﬂow2 framework" without specifying a version number or listing other software dependencies with their versions.
Experiment Setup	Yes	All frames are resized and padded to 641 641 during training and inference. We train our model for 35k iterations with a poly learning rate policy where the learning rate is multiplied by (1 iter itermax )0.9 for each iteration with an initial learning rate of 0.001 to all experiments. The batchsize = 32 and an adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and weight decay = 0 is leveraged. Multi-scale training is adopted to obtain a strong baseline. We select adjacent three frames as reference frames if not speciﬁc.