Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation
Authors: Huihui Song, Tiankang Su, Yuhui Zheng, Kaihua Zhang, Bo Liu, Dong Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed GFA achieves the state-of-the-art performance on popular benchmarks. |
| Researcher Affiliation | Collaboration | Huihui Song1 , Tiankang Su1 , Yuhui Zheng1,2, Kaihua Zhang1, Bo Liu3, Dong Liu4 1B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China 2 College of Computer, Qinghai Normal University, Xining 810016, China 3 Walmart Global Tech, Sunnyvale, CA, 94086, USA 4 Netflix Inc, Los Gatos, CA, 95032, USA |
| Pseudocode | No | The paper does not contain structured pseudocode or an algorithm block explicitly labeled as such. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The training set consists of two parts: (a) all the training data in DVAIS-2016 (Perazzi et al. 2016), which contains 30 videos with about 2,000 frames. (b) a subset of 10,000 frames are selected from You Tube VOS-2018 (Xu et al. 2018) using one frame every 10 frames sampling strategy. |
| Dataset Splits | No | The paper specifies the datasets used for training (DVAIS-2016 and You Tube VOS-2018) and testing (DVAIS-2016, FBMS, and Youtube-objects), but it does not provide explicit details on specific training/validation/test splits, percentages, or sample counts for validation data. |
| Hardware Specification | Yes | We use four NVIDIA 2080TI GPUs with 4 batch size in each GPU, and the total batch size is set to 16. |
| Software Dependencies | No | The paper mentions using Seg Former weights, RAFT for optical flow, and AdamW optimizer, but it does not specify version numbers for these software dependencies (e.g., PyTorch, TensorFlow, specific library versions). |
| Experiment Setup | Yes | All images are resized to 512 512 3 pixels, and the RAFT (Teed and Deng 2020) is adopted to estimate optical flow. The data augmentation strategies includes random rotation, random horizontal flip, random cropping and color enhancement during training. We use four NVIDIA 2080TI GPUs with 4 batch size in each GPU, and the total batch size is set to 16. During training, the model is optimized using the Adam W optimizer (Loshchilov and Hutter 2018) with a cosine decay schedule. The initial learning rate and weight decay are set to 1e-4 and 1e-4 respectively. The trade-off factor λ in the segmentation loss is set to 0.5, and directly yield the binary segmentation mask without any post-processing technique. |