reproducibilityindex.ai

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Authors: Yang Fu, Linjie Yang, Ding Liu, Thomas S. Huang, Humphrey Shi1361-1369

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on the You Tube-VIS dataset validate the effectiveness of proposed Comp Feat. We conduct extensive experiments and ablation study on You Tube-VIS (Yang, Fan, and Xu 2019) to demonstrate the effectiveness of our proposed framework and each of the individual components.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2Byte Dance Inc 3University of Oregon
Pseudocode	No	The paper includes architectural diagrams but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'Our model is implemented based on MMDetection (Chen et al. 2019)' but does not provide an explicit statement or link for the open-source code of their proposed Comp Feat methodology.
Open Datasets	Yes	Data You Tube-VIS is the ﬁrst and largest dataset for video instance segmentation, which is a subset of You Tube-VOS dataset (Xu et al. 2018). ... We choose MSCOCO (Lin et al. 2014) as external data which has a large overlap on the object categories with You Tube-VIS.
Dataset Splits	Yes	Since only the validation set is available for evaluation, all results reported in this paper are evaluated on the validation set.
Hardware Specification	Yes	Our model is implemented based on MMDetection (Chen et al. 2019) and the whole framework is trained end-to-end in 12 epochs with two NVIDIA 2080TI GPUs.
Software Dependencies	No	The paper states 'Our model is implemented based on MMDetection (Chen et al. 2019)' but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	During training, the initial learning rate is set to 0.0125 and decays with a factor of 10 at epoch 8 and 11. For each input frame, we randomly select three frames from the same video, two used as support frames in the dual attention module and the other used as reference frame in the tracking module.