Multi-View Dynamic Reflection Prior for Video Glass Surface Detection

Authors: Fang Liu, Yuhao Liu, Jiaying Lin, Ke Xu, Rynson W.H. Lau

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that VGSD-Net outperforms state-of-the-art approaches adapted from related fields.
Researcher Affiliation Academia Department of Computer Science, City University of Hong Kong {fawnliu2333, yuhaoliu7456, csjylin, kkangwing}@gmail.com, Rynson.Lau@cityu.edu.hk
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured algorithm blocks in a code-like format.
Open Source Code No Code and dataset will be available at https://github.com/fawnliu/VGSD.
Open Datasets No We have also created the first large-scale video glass surface dataset (VGSD-D), consisting of 19,166 image frames with accurately-annotated glass masks extracted from 297 videos. Code and dataset will be available at https://github.com/fawnliu/VGSD.
Dataset Splits No They are randomly divided into a training set (12,315 frames from 192 videos) and a testing set (6,851 frames from 105 videos).
Hardware Specification Yes We build our method using Py Torch toolbox and conduct all experiments on a Tesla V100 GPU with 32 GB memory.
Software Dependencies No The paper mentions building the method 'using Py Torch toolbox' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes We adopt the Adam optimizer with a weight decay of 5 10 4 and a maximum learning rate of 5 10 5. The cosine learning rate scheduler and warm-up are used to adjust the learning rate. The batch size and training epochs are 5 and 15. The input images were randomly flipped horizontally and were resized to 416 416 for network training. We employ Res Next-101 (Xie et al. 2017) pre-trained on Image Net as the encoder. We set the number of masked deformable blocks in LRE to m = 4. The window size and k in MDA are empirically set to 7 7 and 4.