Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

Authors: Huihui Song, Tiankang Su, Yuhui Zheng, Kaihua Zhang, Bo Liu, Dong Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed GFA achieves the state-of-the-art performance on popular benchmarks.
Researcher Affiliation Collaboration Huihui Song1 , Tiankang Su1 , Yuhui Zheng1,2, Kaihua Zhang1, Bo Liu3, Dong Liu4 1B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China 2 College of Computer, Qinghai Normal University, Xining 810016, China 3 Walmart Global Tech, Sunnyvale, CA, 94086, USA 4 Netflix Inc, Los Gatos, CA, 95032, USA
Pseudocode No The paper does not contain structured pseudocode or an algorithm block explicitly labeled as such.
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository for the described methodology.
Open Datasets Yes The training set consists of two parts: (a) all the training data in DVAIS-2016 (Perazzi et al. 2016), which contains 30 videos with about 2,000 frames. (b) a subset of 10,000 frames are selected from You Tube VOS-2018 (Xu et al. 2018) using one frame every 10 frames sampling strategy.
Dataset Splits No The paper specifies the datasets used for training (DVAIS-2016 and You Tube VOS-2018) and testing (DVAIS-2016, FBMS, and Youtube-objects), but it does not provide explicit details on specific training/validation/test splits, percentages, or sample counts for validation data.
Hardware Specification Yes We use four NVIDIA 2080TI GPUs with 4 batch size in each GPU, and the total batch size is set to 16.
Software Dependencies No The paper mentions using Seg Former weights, RAFT for optical flow, and AdamW optimizer, but it does not specify version numbers for these software dependencies (e.g., PyTorch, TensorFlow, specific library versions).
Experiment Setup Yes All images are resized to 512 512 3 pixels, and the RAFT (Teed and Deng 2020) is adopted to estimate optical flow. The data augmentation strategies includes random rotation, random horizontal flip, random cropping and color enhancement during training. We use four NVIDIA 2080TI GPUs with 4 batch size in each GPU, and the total batch size is set to 16. During training, the model is optimized using the Adam W optimizer (Loshchilov and Hutter 2018) with a cosine decay schedule. The initial learning rate and weight decay are set to 1e-4 and 1e-4 respectively. The trade-off factor λ in the segmentation loss is set to 0.5, and directly yield the binary segmentation mask without any post-processing technique.