Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Authors: Yongqing Liang, Xin Li, Navid Jafari, Jim Chen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments, Table 1: The quantitative evaluation on the validation set of the DAVIS17 benchmark [28] in percentages.
Researcher Affiliation Academia Yongqing Liang, Xin Li , Navid Jafari Louisiana State University {ylian16, xinli, njafari}@lsu.edu Qin Chen Northeastern University q.chen@northeastern.edu
Pseudocode No The paper describes algorithmic steps in prose but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/xmlyqing00/AFB-URR.
Open Datasets Yes We evaluated our model (AFB-URR) on DAVIS17 [28] and You Tube-VOS18 [35], two large-scale VOS benchmarks with multiple objects. Pretraining on image datasets [5, 29, 16, 19, 6] (136, 032 images in total).
Dataset Splits Yes DAVIS17 contains 60 training videos and 30 validation videos. You Tube-VOS18 (YV) contains 3, 471 training videos and 474 videos for validation.
Hardware Specification Yes We implemented our framework in Py Torch [26] and conducted experiments on a single NVIDIA 1080Ti GPU. STM [25] evaluated their work on an NVIDIA V100 GPU with 16GB memory, while we evaluated ours on a weaker machine (one NVIDIA 1080Ti GPU with 11GB memory).
Software Dependencies No The paper mentions 'Py Torch [26]' and 'Adam W [21] optimizer' but does not specify their version numbers or other software dependencies with versions.
Experiment Setup Yes The input frames are randomly resized and cropped into 400 400px for all training. For each training sample, we randomly select at most 3 objects for training. We minimize our loss using Adam W [21] optimizer (β = (0.9, 0.999), eps = 10 8, and the weight decay is 0.01). The initial learning rate is 10 5 for pretraining and 4 10 6 for main training.