Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier

Authors: Youngjo Lee, Hongje Seong, Euntai Kim1245-1253

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental From the proposed framework, we achieve state-of-the-art performance in three UVOS benchmark sets: DAVIS16, FBMS, and Seg Track-V2. Experiments Implementation Details We use PFPN (Wang et al. 2020) as our SOD network. For mask prediction, we use STM (Oh et al. 2019). Res Net50 (He et al. 2016) is used as a backbone network that estimates the mask s quality in EFS. ... Datasets We conduct experiments on three benchmark datasets. DAVIS16 (Perazzi et al. 2016)...FBMS (Ochs, Malik, and Brox 2013)...Seg Track-V2 (Li et al. 2013).
Researcher Affiliation Academia Youngjo Lee, Hongje Seong, and Euntai Kim* School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea {lzozo95, hjseong, etkim}@yonsei.ac.kr
Pseudocode No The paper describes its methods in prose and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code or providing a link to a code repository for the methodology described.
Open Datasets Yes Datasets We conduct experiments on three benchmark datasets. DAVIS16 (Perazzi et al. 2016) is a representative dataset of VOS. ... FBMS (Ochs, Malik, and Brox 2013) is a frequently used dataset to evaluate UVOS models. ... Seg Track-V2 (Li et al. 2013) is another UVOS benchmark.
Dataset Splits Yes To evaluate our model, we use official metrics of DAVIS16: region similarity J , boundary accuracy F, and temporal stability T . ... Among them, we evaluate our model with the validation set who has 20 sequences as other papers do. ... FBMS is composed of 59 sequences. Among them, 30 sequences are used as a validation set.
Hardware Specification No The paper mentions that ResNet50 is used and discusses inference speed in seconds per image, but it does not specify any particular GPU models, CPU models, or other detailed hardware specifications (e.g., memory, specific processor models) used for experiments.
Software Dependencies No The paper mentions specific models and architectures used (e.g., PFPN, STM, ResNet50) and names datasets, but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries).
Experiment Setup Yes If nothing is mentioned, the experiments are conducted using the number of easy frames as two and the number of iterations as four. ... We use PFPN (Wang et al. 2020) as our SOD network. For mask prediction, we use STM (Oh et al. 2019). Res Net50 (He et al. 2016) is used as a backbone network that estimates the mask s quality in EFS. ... The network F ( ), which estimates S, is trained with an image dataset named DUTS (Wang et al. 2017). S of each image-mask pair is estimated as b Si = F (concat (xi, byi)) . ... we filter out the frames with abnormal size objects through the area of the saliency mask. ... thsmall and thlarge are thresholds of small and large object respectively.