Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation
Authors: Dezhuang Li, Ruoqi Li, Lijun Wang, Yifan Wang, Jinqing Qi, Lu Zhang, Ting Liu, Qingquan Xu, Huchuan Lu1297-1305
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two popular RVOS benchmarks have verified the effectiveness of our method. We first perform an overall comparison with state-of-the-art methods on the RVOS benchmark datasets, followed by the ablative studies to verify our main contributions. |
| Researcher Affiliation | Collaboration | 1 Dalian University of Technology, Dalian, China 2 Meitu Inc., China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its modules with block diagrams (Figure 2, 3) and mathematical formulations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | We learn our method using the training sets of Refer-You Tube-VOS (Seo, Lee, and Han 2020), Refer DAVIS2017 (Khoreva, Rohrbach, and Schiele 2018), and Ref COCO (Nagaraja, Morariu, and Davis 2016). |
| Dataset Splits | Yes | Table 1 shows the comparison results on Refer-DAVIS2017 validation set. At each iteration, we randomly sample 4 frames within a temporal window size of 100 from a training video, serving as the input to the network. |
| Hardware Specification | Yes | The proposed method runs at 10 FPS per object on NVIDIA 1080TI GPU, which has a good trade-off between efficiency and accuracy. |
| Software Dependencies | No | The paper mentions several components like Res Net50, BERT model, Lovasz segmentation loss, and Adam optimizer, but does not specify their versions or the versions of underlying software frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We empirically set the hyper-parameter λ in (1) to 0.01. We set the memory size N to 3. At each iteration, we randomly sample 4 frames within a temporal window size of 100 from a training video... The whole network is end-to-end trained using the Lovasz segmentation loss... Adam optimizer... is adopted with a batch size of 4. We first train our network for 70 epochs... The default learning rate is 2e-4 which decays by 0.2 in the 40th epoch. Then the whole network is jointly trained for another 80 epochs. The default learning rate here is 2e-5 which decays by 0.2 in the 25th, 75th epoch. |