reproducibilityindex.ai

You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation

Authors: Dezhuang Li, Ruoqi Li, Lijun Wang, Yifan Wang, Jinqing Qi, Lu Zhang, Ting Liu, Qingquan Xu, Huchuan Lu1297-1305

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two popular RVOS benchmarks have veriﬁed the effectiveness of our method. We ﬁrst perform an overall comparison with state-of-the-art methods on the RVOS benchmark datasets, followed by the ablative studies to verify our main contributions.
Researcher Affiliation	Collaboration	1 Dalian University of Technology, Dalian, China 2 Meitu Inc., China {Merci, dutlrq77}@mail.dlut.edu.cn, {ljwang, wyfan, jinqing}@dlut.edu.cn, luzhangdut@gmail.com, {lt, xqq}@meitu.com, lhchuan@dlut.edu.cn
Pseudocode	No	The paper describes its modules with block diagrams (Figure 2, 3) and mathematical formulations, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	Yes	We learn our method using the training sets of Refer-You Tube-VOS (Seo, Lee, and Han 2020), Refer DAVIS2017 (Khoreva, Rohrbach, and Schiele 2018), and Ref COCO (Nagaraja, Morariu, and Davis 2016).
Dataset Splits	Yes	Table 1 shows the comparison results on Refer-DAVIS2017 validation set. At each iteration, we randomly sample 4 frames within a temporal window size of 100 from a training video, serving as the input to the network.
Hardware Specification	Yes	The proposed method runs at 10 FPS per object on NVIDIA 1080TI GPU, which has a good trade-off between efﬁciency and accuracy.
Software Dependencies	No	The paper mentions several components like Res Net50, BERT model, Lovasz segmentation loss, and Adam optimizer, but does not specify their versions or the versions of underlying software frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We empirically set the hyper-parameter λ in (1) to 0.01. We set the memory size N to 3. At each iteration, we randomly sample 4 frames within a temporal window size of 100 from a training video... The whole network is end-to-end trained using the Lovasz segmentation loss... Adam optimizer... is adopted with a batch size of 4. We ﬁrst train our network for 70 epochs... The default learning rate is 2e-4 which decays by 0.2 in the 40th epoch. Then the whole network is jointly trained for another 80 epochs. The default learning rate here is 2e-5 which decays by 0.2 in the 25th, 75th epoch.