Structure-Aware Spatial-Temporal Interaction Network for Video Shadow Detection
Authors: Housheng Wei, Guanyu Xing, Jingwei Liao, Yanci Zhang, Yanli Liu
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative experimental results demonstrate that our approach significantly outperforms the state-of-the-art methods, providing stable and consistent shadow detection results in complex video shadow scenarios. We evaluate the performance of the proposed algorithm on common video shadow detection datasets and compare it with state-of-the-art (SOTA) methods. |
| Researcher Affiliation | Academia | Housheng Wei1 , Guanyu Xing1 , Jingwei Liao2 , Yanci Zhang3 and Yanli Liu3 1National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University 2Department of Information Sciences and Technology, George Mason University 3College of Computer Science, Sichuan University |
| Pseudocode | No | The paper describes the network architecture and specific modules, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We utilize the currently popular Video Shadow Detection Dataset (Visha) [Chen et al., 2021] to demonstrate the effectiveness of our study. This dataset comprises 120 scenes, with 50 used for training and 70 for testing. The backbone network of the proposed method is initialized using the pre-trained parameters of Beitv2 [Peng et al., 2022] on the COCO-Stuff segmentation dataset [Caesar et al., 2018]. |
| Dataset Splits | No | The paper states that 50 scenes are used for training and 70 for testing for the Visha dataset. It does not explicitly mention a separate validation set or specific percentages for train/validation/test splits beyond this training and testing division. |
| Hardware Specification | Yes | The experiments are conducted with a batch size of 2, a total of 20,000 iterations, and are trained on a single NVIDIA RTX 3090ti with 24GB of VRAM, taking approximately 10 hours and 30 minutes. |
| Software Dependencies | No | The experiments in this paper are conducted using the MMSegmentation [Contributors, 2020] segmentation framework and the Pytorch framework. While specific frameworks are mentioned, no version numbers for these or other libraries/dependencies are provided. |
| Experiment Setup | Yes | For parameter optimization, the backbone network of the proposed method is initialized using the pre-trained parameters of Beitv2 [Peng et al., 2022] on the COCO-Stuff segmentation dataset [Caesar et al., 2018]. The rest of the parameters are randomly initialized using the Xavier method [Glorot and Bengio, 2010] and optimized during training. We use the Adam W optimizer [Loshchilov and Hutter, 2017], with an initial learning rate of 2e-5, a weight decay of 0.05, and employ a poly learning rate decay. The experiments are conducted with a batch size of 2, a total of 20,000 iterations, and are trained on a single NVIDIA RTX 3090ti with 24GB of VRAM, taking approximately 10 hours and 30 minutes. |