Multi-View Dynamic Reflection Prior for Video Glass Surface Detection
Authors: Fang Liu, Yuhao Liu, Jiaying Lin, Ke Xu, Rynson W.H. Lau
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that VGSD-Net outperforms state-of-the-art approaches adapted from related fields. |
| Researcher Affiliation | Academia | Department of Computer Science, City University of Hong Kong {fawnliu2333, yuhaoliu7456, csjylin, kkangwing}@gmail.com, Rynson.Lau@cityu.edu.hk |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured algorithm blocks in a code-like format. |
| Open Source Code | No | Code and dataset will be available at https://github.com/fawnliu/VGSD. |
| Open Datasets | No | We have also created the first large-scale video glass surface dataset (VGSD-D), consisting of 19,166 image frames with accurately-annotated glass masks extracted from 297 videos. Code and dataset will be available at https://github.com/fawnliu/VGSD. |
| Dataset Splits | No | They are randomly divided into a training set (12,315 frames from 192 videos) and a testing set (6,851 frames from 105 videos). |
| Hardware Specification | Yes | We build our method using Py Torch toolbox and conduct all experiments on a Tesla V100 GPU with 32 GB memory. |
| Software Dependencies | No | The paper mentions building the method 'using Py Torch toolbox' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We adopt the Adam optimizer with a weight decay of 5 10 4 and a maximum learning rate of 5 10 5. The cosine learning rate scheduler and warm-up are used to adjust the learning rate. The batch size and training epochs are 5 and 15. The input images were randomly flipped horizontally and were resized to 416 416 for network training. We employ Res Next-101 (Xie et al. 2017) pre-trained on Image Net as the encoder. We set the number of masked deformable blocks in LRE to m = 4. The window size and k in MDA are empirically set to 7 7 and 4. |